There have been some misunderstandings on how the GPU2 core works. In particular, for small proteins like villin on GPU's with large number of stream processors (SP's) like the 3850 or 3870, the protein is too small to use a larger number of SP's unless the CPU is very fast. Some people have guessed that there is some internal SP limit. This is incorrect; the problem is that small proteins can't be parallelized amongst a large number of SP's.
We are working to release larger proteins (about 2x the number of atoms) as they are more interesting scientifically and use the GPU's (even the high end ones) much closer to 100%. The exciting part for us is that the larger proteins run at almost the same speed as the slower ones on GPU's (whereas on CPU's, they're 4x slower); this is where the GPU2 code should shine. In parallel, Mike Houston at AMD is working to optimize CAL such that it has lower CPU overhead.
For now, we're pushing out villin WU's as a test (good to know that the code is working well), but we expect the larger WU's to be going out soon (say a week or two, pending internal testing).