For some time, we've been keeping an updates thread with information for every change that we do to the FAH backend (new WUs, servers on- and off-line, etc). However, many donors likely don't know about it, so here's a reminder:
For some time, we've been keeping an updates thread with information for every change that we do to the FAH backend (new WUs, servers on- and off-line, etc). However, many donors likely don't know about it, so here's a reminder:
Posted at 09:38 AM in How FAH works | Permalink
One of the more common question I get asked is how we do our client/server/core programming and backend system administration. Also, others were curious about updates on various core projects. So, I thought it made sense to answer both in one post, since the answers are related. This will be a bit of a long answer to several short questions, but hopefully it will help give some insight into how we do what we do.
Posted at 08:41 AM in How FAH works | Permalink
We have added one word to the End User License Agreement (EULA), but it is an important word. The EULA can be found at
"You may use this software on a computer system only if you own the system or have the written permission of the owner."
We felt that this was an important addition in order to avoid any confusion. There have been a few situations where donors felt that they had permission, but the owners of the computers did not. Having written permission is the best way to make sure that there is no doubt. It also gives protection to the donor in that he/she would then have proof of permission, avoiding problems involving oral agreements.
Posted at 06:15 AM in How FAH works | Permalink | Comments (6)
Our SMP core is very different than the way other distributed computing projects handle multi-core CPU's, and I thought it might be interesting for the FAH community to hear about the differences, pro and con. As I think most people interested in computers know, Moore's law stating that the transistor count in CPUs will double every 1.5 years has continued for decades. Most people think of Moore's law in terms of the speed of CPU's, but this isn't what Moore originally had in mind. In the past, more transistors have lead to greater CPU speeds, but that has essentially ended (at least for traditional CPU's) a few years ago.
But if Moore's law is still marching along (as it is), what do all those transistors do? Over the last few years, more transistors have translated into more CPU cores, i.e. more CPUs on a chip. While this is not what we wanted, this is perhaps not necessarily a disaster, if one can use these multiple CPUs to get faster calculations. If we simply do more calculations (i.e. multiple Work Units, or WU's, simultaneously) not faster calculations (a WU completed in less time), distributed computers will run into the same problems that face supercomputers: how to scale to lots and lots of processors -- i.e. how can we use all these processors to do a calculation faster over all.
In FAH, we've taken a different approach to multi-core CPUs. Instead of just doing more WU's (eg doing 8 WU's simultaneously), we are applying methods to do a single WU faster. This is typically much more valuable to a scientific project and it's important to us. However, it comes with new challenges. Getting a calculation to scale to lots of cores can be a challenge, as well as running complex multi-core calculations originally meant for supercomputers on operating systems not meant for this (eg Windows).
Right now, our SMP client seems to be running fairly well under Linux and OSX -- operating systems based on UNIX, as is found on supercomputers. We use a standard supercomputing library (MPI) to run these WU's and MPI behaves well on Unix-based machine. MPI does not run well on Windows and we've been running into problems there. However, as Windows MPI implementations mature, our SMP/Windows app will behave better. Along the way, we also have a few tricks up our sleeve which may help as well. However, if we can't get it to run as well as we'd like on Windows, we may choose to overhaul the whole code, as we did with the GPU1 client (which was really hard to run).
We're very excited about what the SMP client has been able to do so far. One of our recent papers (#53 in our papers web site http://folding.stanford.edu/English/Papers) would have been impossible without the SMP client and represents a landmark calculation in the simulation of protein folding. We're looking forward to more exciting results like that in the years to come!
Posted at 07:24 PM in How FAH works | Permalink | Comments (12)
Posted at 04:27 PM in How FAH works | Permalink | Comments (15)
There have been some misunderstandings on how the GPU2 core works. In particular, for small proteins like villin on GPU's with large number of stream processors (SP's) like the 3850 or 3870, the protein is too small to use a larger number of SP's unless the CPU is very fast. Some people have guessed that there is some internal SP limit. This is incorrect; the problem is that small proteins can't be parallelized amongst a large number of SP's.
We are working to release larger proteins (about 2x the number of atoms) as they are more interesting scientifically and use the GPU's (even the high end ones) much closer to 100%. The exciting part for us is that the larger proteins run at almost the same speed as the slower ones on GPU's (whereas on CPU's, they're 4x slower); this is where the GPU2 code should shine. In parallel, Mike Houston at AMD is working to optimize CAL such that it has lower CPU overhead.
For now, we're pushing out villin WU's as a test (good to know that the code is working well), but we expect the larger WU's to be going out soon (say a week or two, pending internal testing).
Posted at 10:00 AM in How FAH works | Permalink | Comments (22)
One of the remarkable strengths of thermodynamics is that it is in many ways more general than many of the laws of physics. One of the key concepts in thermodynamics is that of a state function. The key aspect of a state function is that if you're interested in the differences in a state function between some initial and final state, it doesn't matter what you do to get from the initial to the final -- whatever path you take would get you the same value for the difference of the state function.
How is this useful? Well, often we can take paths in simulations which are not possible in reality, but since the values of state functions only depend on the initial and final state, as long as those as the same as in reality, we get the same result! One key way we use to take advantage of this is thermodynamic alchemy. For example, often one is interested in the difference between an amino acid could be a single methyl group. In simulation, we can turn a proton alchemically into a methyl group and calculate everything we need thermodynamically, even if this isn't possible to do experimentally in a direct way!
Posted at 09:51 AM in How FAH works | Permalink | Comments (0)
One of the critical issues in computer science right now is the limits to how fast a single CPU can calculate. While Moore's law is still going strong in a literal sense -- i.e. the number of transistors which one can put on a chip is doubling every 2 years or so -- this doubling of transistors is not leading to a doubling in CPU speed as it used to over the last few decades. Well, at least not for typical programs (eg Microsoft Word). In order to get big speed increases, there'sa major change in the programming paradigm. One key change is the existence of "streaming processors." GPU's and the Cell processor in the PS3 are both examples of streaming processors.
What makes streaming processors potentially much faster than regular CPU's is how they handle computation vs memory access. Normal CPU's use lots and lots and lots of transistors on cache (local memory on the CPU chip to help keep the CPU fed with data and instructions). Streaming processors use the additional transistors on additional computing elements (eg floating point units). By doing so, they can do lots of FLoating point OPerations per second (FLOPS) in an optimal situation, although getting one's code to behave optimally is not easy. Typically this means balancing FLOPS with memory access to make sure that there's data available for calculation. This has been the primary challenge in our GPU and PS3 codes, and is something which we have, for the most part, figured out for a significant subset of the calculations we run on FAH.
These advances have lead to our GPU and PS3 clients. The family history of all of this starts with the GPU core. This GPU code was then brought over to the PS3 and enhanced. We are working to bring back some of those scientific enhancements back to the GPU code. This is all pretty bleeding edge, but so was distributed computing in 2000 when we started. Our expectation is that given how modern processors are developing, in 8-10 years streaming processors will be much more standard and will be a major way in which FAH works.
Finally, it's also interesting to think of how CPU's may themselves turn into streaming processors. As CPU's add more cores, they start to have more functionality similar to streaming processors in a limited way. Perhaps more interesting are some of the new chips rumored to be developed at Intel and AMD/ATI. Intel's 80-core chip is very interesting and something which our code would likely run well on. Also, the fusion of AMD's CPU's with ATI's GPU's could be very exciting, potentially bringing the best of both worlds. We're looking forward to these and lots of other emerging technology. FAH is running very fast now (over a petaflop, i.e. 1,000,000,000,000,000 floating point operations per second !) and we look forward to continuing to push the frontiers.
Posted at 08:34 AM in How FAH works | Permalink | Comments (4)
This post may get a little technical, but I wanted to start a new set of posts describing the inner workings of Folding@home. To say FAH is complex is in many ways an understatement. On the surface, FAH does seem a lot like other distributed computing projects: there's lots of WU's which go out to client machines and they get calculated and then come back. However, there are a lot of differences going on in FAH.
One of the principal challenges in FAH is that we're trying to use lots of processors to in a sense speed up a calculation which many would have thought was intrinsically serial, i.e. that could only be done by a single very, very, very fast processor. The reason why is that we are studying how proteins change in time, and it's hard to parallelize the 23rd step if you haven't completed the 22nd step, etc.
However, through the years we have been developing ways to solve this issue. In the last few years, we have made significant progress in a method called "Markov State Models" or MSMs for short. MSMs in a sense allow us to parallelize these seemingly intrinsically serial tasks. The way this works is that we build a kinetic model for the process, dividing up the possible dynamics into a series of states (related protein conformations) and rates between these states. The rates are what's calculated in FAH WU's. Once we have all this data, we need to run some fairly sophisticated Bayesian Machine Learning methods to identify what are reasonable states and then to calculate the rates between them.
We have had a several recent advances in MSM methodology and those papers are on our papers website. We have also had several MSM applications including studying protein folding, lipid vesicle fusion, and abeta aggregation (Alzhiemer's Disease simulations). While we will continue to improve our MSM methodology, we are very much excited about the potential applications and have great thrusts going in both areas.
Posted at 11:03 AM in How FAH works | Permalink | Comments (3)