Engineers at NVIDIA (notably Scott LeGrand) have come up with a theory for the EUE's seen in core 1.15 (and a few others in the 1.15 to 1.18 range) on certain hardware. They found that this core had code optimizations that drove the GPU so hard that it would draw a lot more electricity (one sign of this was running hotter). In some boxes, this was too much electricity and this lead to numerical instabilities. When the same machine was given a beefier power supply, the problem went away.
We've been told that 8800's require 600W power supplies, but we're finding that even a little bigger (eg at least 650W) is important to leave some room for error. We are working to see if there is some way to detect this issue in software, but for now, if you're getting EUE's on the NV GPU client, this is something to consider.
By the way, this will be very important for us to consider future code optimizations. NV core v1.19 removed some optimizations to solve this problem, but there are many cards which would run fine w/this more optimized code. If we can find a way to detect whether the card can draw enough power, we may be able to choose different code paths to allow for greater optimization for cards which can handle it.
We're still looking into this. For now, if you're seeing issues with your card, please consider trying out a bigger power supply. We will continue to look to see if this is indeed the problem and what we can do to help the situation such that the code runs stably on all machines.
does ati have better power management? because i have a 400watts power supply with a 4850 running better than fine. m
Posted by: mark | November 08, 2008 at 08:13 PM
How far off are more power efficient chips and more efficient cooling systems? Every company I read about brags how it's chips, boards and cooling are so far ahead of any other company, I thought all this had been "solved" ages ago! Or, were all those "breakthroughs" just hype or too expensive to manufacture? Perhaps we need a way to turn off optimizations automatically when a problem happens, and remember the lower power consuming setting the next time the PC is turned on.
Posted by: Adam A. Wanderer | November 08, 2008 at 11:50 PM
@ mark
The Radeon HD4850 does indeed use less power under full load than the GeForce GTX260, and significantly less than the GeForce GTX280 and GeForce 8800 Ultra. See http://www.tomshardware.com/reviews/radeon-hd-4850,1957-20.html
The Radeon HD4870 consumes more power under full load than the GeForce GTX260 (although the HD4870 is quieter and it's heat is vented outside the computer case), but less than the GeForce GTX280 and GeForce 8800 Ultra. See http://www.tomshardware.com/reviews/radeon-hd-4870,1964-15.html and http://www.tomshardware.com/reviews/radeon-hd-4870,1964-16.html
However the reason for your stability could be due to the fact that you are running the ATI core and not the NV core described in the article above. I don't know if Stanford has made similar code optimisations in the ATI core or not.
Posted by: S | November 10, 2008 at 05:26 AM
Good luck! I like the idea of the smart client deciding which optimizations to use...or even one that lets the user disable optimizations.
Posted by: ruisu | November 12, 2008 at 06:56 PM
Nevermind about the user control, just read your forum post.
Posted by: ruisu | November 12, 2008 at 06:57 PM
I have a eVGA GTX280 SuperClocked, and I've rated it to draw spikes of 392watts! I also was only able to monitor the draw on the PICe power adapters, I cannot rate the draw (if any) that it's pulling through my pcie slot. Since I rated my GPU to have that high of a draw, and I also noticed that it's a very inconsistent load. My GTX280 seems to draw in spikes, and bursts while running FAH GPU2 core. Now I installed a Thermotake AUX 450W PCIe 2.0 PSU, releaving all power stress from my primary PSU. Also the AUX PSU is 80%+ efficiant, and does shut down when my GPU is idle. As an added bonus the +12vdc lights on the fron of it are perfect activity LEDs for my GPU. :-)
Posted by: Xboxor | November 14, 2008 at 10:42 AM
I don't think so. I get unstable machine or EUE after screen blink for one ot two times, because I start VLC player, which also use videocard. This happen quite accidentally however. But when I use visualization on protein this is almost sure, that end with one of this two messages, besides the PC grafics freeze often.
Posted by: h | November 20, 2008 at 10:35 AM
I found out about folding@home when it downloaded with my ATI drivers. It wouldn't let me unselect it, and I was pretty excited when I figured out what it was. So yes, I imagine they've got code running on the HD4870s now! =)
Posted by: Mike | December 11, 2008 at 09:00 PM
I get more EUE now with the 1.19 code than I did with previous versions and I have a 8800GT with a 450W power supply. Never had a problem until this new "unoptimized" code.
Posted by: Bill | January 25, 2009 at 07:41 AM