There's been several questions regarding the GPU1 client and why we
decided to shut it down. I hope I can shed some light here at least on why we're doing what we're doing, such that even if people disagree with our decisions, they can at least see where we're coming from.
Some people have asked "why shutdown the client if it's working?" The bottom line here is that the GPU1 results are no longer scientifically useful. It's pretty clear now that DirectX (DX) is not sufficiently reliable for scientific calculations. This was not known before (and some people wouldn't believe this until we proved it). With the GPU1 results, we can now show what the limitations are pretty decisively.
GPU1 also did help us a lot in terms of developing its successor and what's needed to run GPU's in a distributed computing fashion. The good news here is that GPU2 is behaving very well, on both ATI and NVIDIA hardware, and this is a direct result of what we've learned with GPU1 WU's. In the end, however, GPU1 will not be able to help us understand protein misfolding, Alzheimer's Disease, etc due to these unresolvable limitations. We could keep GPU1 live just crunching away in its current form, but that would be wasting people's electricity at this point, as we've learned everything we can learn from those cards, based on what they can do.
In the past, we had a somewhat similar shutdown situation, i.e. when QMD core projects stopped. In that case, donors were left hanging since we didn't give any warning for stopping QMD projects. We did try (perhaps unsuccessfully) to handle the GPU1 situation better than QMD. In QMD, we stopped needing that core and so we stopped the calculation without warning, not realizing the impact that would cause. With GPU1, we gave a several month warning (indeed, note that GPU1 is still actively running), so this is information in advance to shutting down GPU1). We tried to avoid the QMD situation by giving advance warning, but it looks like donors would like even more advance warning. However, there's limits to how much in advance we know the situation ourselves.
Indeed, the knowledge that it made sense to end GPU1 came reasonably recently to us. We have been working on CAL for a while and it seemed that CAL might be a solution, but we only knew until we got some testing "in the wild." DirectX (DX -- what GPU1 is based on) works much better in the lab than in the wild, and it was possible that CAL behaved that way too. After seeing that CAL behaved well in the wild, it became clear that the GPU1 path was obsolete. However, this is a relatively recent finding and we made the announcement about the situation relatively shortly thereafter.
It was a tough decision. Some suggested we just leave GPU1 running, even though people's electricity really would be going to waste, other than generating points. I didn't think that was a good idea. We did know it would be a tough PR hit, but when people talk about the history of FAH, I want to make it clear that we're here to address AD and other diseases, not just running calculations for the sake of points and nothing more (which has been the critique of some other distributed computing projects).
So, what's the right thing to do? I guess it comes to this: would GPU1 donors be happier if we just keep GPU1 servers running, doing with no scientific value, but just for points? We could do that, at a cost of taking away personnel from improving existing clients, keeping existing servers going, etc for the sake of keeping GPU1 running. However, that's not what FAH is for and I think it's important that FAH not devolve into a big points game, losing sight of why we're doing what we're doing.
PS Some further discussion can be found here
Some people have asked "why shutdown the client if it's working?" The bottom line here is that the GPU1 results are no longer scientifically useful. It's pretty clear now that DirectX (DX) is not sufficiently reliable for scientific calculations. This was not known before (and some people wouldn't believe this until we proved it). With the GPU1 results, we can now show what the limitations are pretty decisively.
GPU1 also did help us a lot in terms of developing its successor and what's needed to run GPU's in a distributed computing fashion. The good news here is that GPU2 is behaving very well, on both ATI and NVIDIA hardware, and this is a direct result of what we've learned with GPU1 WU's. In the end, however, GPU1 will not be able to help us understand protein misfolding, Alzheimer's Disease, etc due to these unresolvable limitations. We could keep GPU1 live just crunching away in its current form, but that would be wasting people's electricity at this point, as we've learned everything we can learn from those cards, based on what they can do.
In the past, we had a somewhat similar shutdown situation, i.e. when QMD core projects stopped. In that case, donors were left hanging since we didn't give any warning for stopping QMD projects. We did try (perhaps unsuccessfully) to handle the GPU1 situation better than QMD. In QMD, we stopped needing that core and so we stopped the calculation without warning, not realizing the impact that would cause. With GPU1, we gave a several month warning (indeed, note that GPU1 is still actively running), so this is information in advance to shutting down GPU1). We tried to avoid the QMD situation by giving advance warning, but it looks like donors would like even more advance warning. However, there's limits to how much in advance we know the situation ourselves.
Indeed, the knowledge that it made sense to end GPU1 came reasonably recently to us. We have been working on CAL for a while and it seemed that CAL might be a solution, but we only knew until we got some testing "in the wild." DirectX (DX -- what GPU1 is based on) works much better in the lab than in the wild, and it was possible that CAL behaved that way too. After seeing that CAL behaved well in the wild, it became clear that the GPU1 path was obsolete. However, this is a relatively recent finding and we made the announcement about the situation relatively shortly thereafter.
It was a tough decision. Some suggested we just leave GPU1 running, even though people's electricity really would be going to waste, other than generating points. I didn't think that was a good idea. We did know it would be a tough PR hit, but when people talk about the history of FAH, I want to make it clear that we're here to address AD and other diseases, not just running calculations for the sake of points and nothing more (which has been the critique of some other distributed computing projects).
So, what's the right thing to do? I guess it comes to this: would GPU1 donors be happier if we just keep GPU1 servers running, doing with no scientific value, but just for points? We could do that, at a cost of taking away personnel from improving existing clients, keeping existing servers going, etc for the sake of keeping GPU1 running. However, that's not what FAH is for and I think it's important that FAH not devolve into a big points game, losing sight of why we're doing what we're doing.
PS Some further discussion can be found here
Agreed. The point system should be designed to encourage people to make contributions that are scientifically useful, at least on an indirect basis if not a direct basis.
Posted by: Stephen Dewey | May 27, 2008 at 07:30 PM
I am not a GPU cruncher and I am interested in advancing research. If you wanted to take less of a PR hit then slowly or quickly give fewer and fewer points for GPU1 clients until it just doesn't matter. Just a thought. It might just be the best of all worlds! Complainers never win and winners never complain! :)
Posted by: HP_Raider | May 27, 2008 at 10:57 PM
Anyone who uses the higher performance beta clients knows that they will be superceded at some point - I think this is just a sign of (rapid) progress on the GPU front. I'm sure that most people won't want to crunch WU's just for crunching's sake - but I guess that people who Folded on older hardware (I used to Fold on an X1950XTX) may be a bit lost with not being able to contribute anymore ..... sounds like a great excuse for an upgrade ;-)
Posted by: DocJonz | May 27, 2008 at 11:15 PM
What was wrong with DirectX in the wild? Were cards returning different results given the same work unit?
Posted by: kwyjibo | May 28, 2008 at 02:22 AM
How about letting those who have x1900 cards run in gpu2 so the gpu1 can be shut off? Arnt them similar?
Posted by: penguinusaf | May 28, 2008 at 02:43 AM
I'd agree that part of the problem is probably that certain hardware could run GPU1 but can't run GPU2. If you avoided the hardware obsolescence issue, there might be fewer complaints. That might not be scientifically/computationally feasible, however.
Posted by: Stephen Dewey | May 28, 2008 at 06:32 AM
This reminds me of what Thomas Edison said: Genius is 1% inspiration and 99% perspiration.
Posted by: S | May 28, 2008 at 06:48 AM
Is there clear scientific criteria for validating the folding result? I am afraid other, so many results from another clients are to be concluded as unreliable. Probably you have already established methods of validating folding results, would you advise me reference? Any links would be fine.
Posted by: aki | May 28, 2008 at 07:38 AM
I can answer the question from penguinusaf: How about letting those who have x1900 cards run in gpu2 so the gpu1 can be shut off? Arnt them similar?
ATI provides a software interface to the 2xxx and 3xxx called "CAL" and used for GPU2. CAL doesn't support the x1900 cards. The x1900 cards used the DirectX interface instead for GPU1 and it does not work reliably.
Posted by: b | May 28, 2008 at 02:15 PM
I think F@H learned with GPU1... that they needed a GPU2. The whole concept of getting GPGPU code to run through DirectX (where anything nearly could come along and blow off the DX context) seemed like pushing it all along.
Posted by: Sneakers55 | May 28, 2008 at 04:01 PM
So, when is the nVidia client coming out and what sort of performance can we expect? I got an overclocked Radeon 3850 for the primary purpose of F@H but so far it hasn't really satisfied me for games.. it would be great if nVidia's performance is at least as good as ATI's, but hopefully better. Then it'll make sense to switch to an nVidia GPU that's great for both folding and games..
Posted by: WJS | May 28, 2008 at 09:17 PM
Any sign of a GPU client for Linux-64 ???
Posted by: DocJonz | May 29, 2008 at 12:01 AM
Good on those for folding with GPU1, It appears you've significantly helped Stanford advance their research so GPU2 could be possible. Without u guys GPU2 wouldn't be where it is and i think thats a great contribution to the project.
Thankyou
Posted by: smASHer88 | May 29, 2008 at 04:28 AM
You made the correct decision. Encouraging people to waste electricity for bad science is illogical. On the flip side, maybe some of the x1000 series people people would consider purchasing newer graphics cards in order to help all of humanity.
Posted by: Neil Rieck | May 30, 2008 at 12:41 PM
DirectX (DX -- what GPU1 is based on) works much better in the lab than in the wild
I'm curious about this comment. Could this be due to "overclocking" or "flaky memory combined with lack of parity checking"?
Posted by: Neil Rieck | May 31, 2008 at 07:18 AM