There have been two issues which have been causing FAH some series issues. I wanted to give an update.
First, networking on subnet 171.64.122.XX seems to be very slow and easily overloaded. Our diagnostics (and donor input) points to this issue being caused by a firewall on this subnet that can't handle the load. I have asked 3 different branches of our IT dept looking into this, but nobody has any fixes just yet. Getting impatient with this slow response, some time ago I requested for us to get a new network for our machines. This request is being processed. Once it's accepted, it will take a little time for them to get the new net in there (they may be able to do a VLAN, but more likely they will have to run a new physical cable, since the VLAN would still be behind the firewall). This subnet also has some of our collection servers, so this will be a big help there too once this is resolved.
Second, the current server code can get overloaded. When it does, it slows down. However, new code in the server notices this and restarts the server binary. This leads to downtime of about an hour when this happens. While, this does autofix the problem with just a little downtime, I'd like zero downtime (as would most people). I have paid a professional software house to rewrite our FAH server backend from scratch. That is almost done (it's in QA right now, with some somewhat minor issues to address). This new server code should address this issue (and other issues) with the server code, but may introduce new issues that need to be smoothed over. However, the rewrite is MUCH cleaner architected and so that will be important going forward in the future.
I just wanted to give a bump to let people know where we are. These issues aren't quick to resolve, but we are making progress. The new server code in particular will be a big help in the next 5 years of FAH, due to its rearchitecting and much cleaner code.
We've been looking into the issues some people have been finding with the viewer. On the ATI/AMD side, the catalyst 8.9 drivers seem to have resolved a number of issues with the viewer.
On the NVIDIA side, we have also now released on our download page a special version of the viewer designed for NVIDIA GPU's; this viewer has modifications that makes the viewer run much more smoothly and in general and behave better on a broader range of NVIDIA GPU's. If you're curious to run the viewer, please check it out (on our high performance client download page).
In partnership with Sony, we've rolled out a new update to the Folding@home client for the PS3, now built into their "Life With Playstation" (LWP) application. They've done a great job of building a beautiful piece of software. Folding@home is running underneath in the background. Noam Rimon, the lead developer on the Sony side put it well in an earlier blog post as well as a more recent post on the PS3 blog
We’re going to offer this as a free service that will be easily accessible directly from the XMB. Life with PlayStation will feed live content to your PS3 with updates on news and weather on a visually stunning and interactive global map. Imagine being able to wake up to your PS3 to see if you need to pack an umbrella for the day. Or just relax as you listen to your favorite tunes while reading up on top news from around the world.
We've also updated the science code to do the same functionality of what the GPU2 code can do today. However, people will notice that the ns/day will go significantly down. We want to stress that the points per day remains the same. This means that one gets more points/ns since the calculation is more complex, so don't worry too much if your ns/day is lower. LWP takes up a little extra compute time from the science, but not much. Most of the decrease in ns/day is due to the fact that it's doing a much more complex calculation (which is good for the science side of this).
There's also a nice Video demo done by Noam Rimon showing the functionality of LWP:
We'll give further updates about this as time goes on, but we're excited to have the new science functionality in and to have a client which hopefully can even further broaden the participation of PS3 owners in Folding@home.
PS Some people have noticed that LWP seems slower. In particular, the ns/day count is down, but that's not what it might seem. We're doing more complex science which runs slower on the PS3, so that's responsible for most of the slow down you're seeing.
For those who are interested, it's slower because we're running a more complex version of the GB implicit solvent, which is much more computationally demanding; the upside is that this would be much more accurate. This is also the main solvent model we're running on GPU2 right now. In fact, earlier PS3 results pointed us to the significance and importance of moving to this new model, so this (and GPU2) reflects what we've learned so far.
However, there is a small performance hit due to all the other life with Playstation (LWP) eye candy. However, it's not very big in the end (the Cell and RSX GPU are very powerful), with most of the slow down in terms of ns/day coming from the new science code. The upshot of the LWP platform will hopefully be getting a lot more PS3 donors (but I guess time will tell there).
So far, the response will be strong and I expect that this could in time double the FLOP count from PS3 donors in Folding@home, which itself will be pretty impressive. More importantly, with the advanced science now rolled out on both the PS3 and GPU2 platforms, we're well-poised to tackle some much more complex and interesting calculations.
We've released a series of new WU's (announced in this forum thread http://foldingforum.org/viewtopic.php?f=52&t=5452&start=0 ). From our statistics, there are fewer EUE's with these WU's than previous ones, but it definitely looks like people who weren't seeing EUE's before are seeing them with this one. I should stress that we release WU's from beta to advanced to all of FAH based on EUE statistics and these were not just "ok", but better than previous ones. I think what was different here is the fact that the EUE profiles were different, i.e. people who didn't get EUE"s before were getting them.
These WU's are important as they represent the major next step from the first generation GPU2 WU's. However, as we've moved to more interesting and complex (and important) protein systems, we're seeing new behavior in the GPU2 core code. We've been working on core modifications to help address this and they are in QA right now.
It will take a few days for the new core to get out of QA, but we think this new core should help, although it's not clear it it will solve all the issues people are seeing. Thus, we are continuing to work to see what's up. However, as I mentioned above, these WU's appear to be EUE-ing less than previous WU's (based on the fraction of returned WU's that EUE'd), which means we are getting a lot of useful WU's back as well. Hopefully with these working WU's as well as the knowledge of which WU's EUE'd, we can work to improve the core further.
It's important for us to stress that the GPU2 client is a "high performance client", which means that it's pretty bleeding edge technology. Indeed, just running any sort of production calculation on a GPU is pretty new ground. For this reason, we've tried to make it clear that this software is experimental and still very much under development (see the discussion on our high performance client download page http://folding.stanford.edu/English/DownloadWinOther ), but we wanted to remind people of this designation and its implications. In time, we should be able to further improve the software, but these are still very early days for running complex calculations like FAH on GPUs.
Finally, we have some some exciting news on the GPU2 front science-wise. With the results we're getting back so far, we've been able to submit our first scientific results from the GPU2 client for peer review. We're very excited about the GPU2 client in general and look forward to working with donors to improve the software further.