Our server is back up and we're in the progress of doing a stats update. This update alone will take some time and so WU's will take longer to get through the system for about 24 hours. The good news is that all the key systems are back on line.
Most machines are now back up with a few exceptions. One key exception is a server used in the stats updates. So, while the points are being recorded on local servers, these logs are not being inputted into the stats at the moment, i.e. stats updates are on hold at the moment.
We will give an update when the stats update is back on line. Please keep in mind that the points are not being lost, just not being entered into the db and once this machine is up, all the logs (and all the old points from last night, etc) will be entered into the stats db.
Power mostly restored to campus Updated 3 p.m pacific time. - Power has been restored to the majority of the Stanford campus. All normal power is expected to be restored by 5 p.m. or sooner. A major outage occurred today at 11:30 a.m., affecting P G & E customers in Stanford, Menlo Park, Atherton and Palo Alto. If you are still experiencing difficulties on the Stanford campus, call 723-2281. At Stanford Hospital, dial 286.
This one was a major disaster at Stanford and Palo Alto (and nearby cities), probably the biggest outage in a while. However, there seems to be some major Stanford power outage once a year, which is a major problem.
With this in mind, we have been distributing more of FAH to outside of Stanford (with servers at UCSF, Columbia, Cal State U Long Beach, and U. Pittsburgh). We hope to have a European site soon. Once those sites are a bit more established, we'll see about pushing an assignment server to a non-Stanford site and we should be much more safe to Stanford-related issues.
Also, it's good that we have servers in 4 different server rooms on campus. One stayed up the whole time, two came up fast, and the fourth (VSPGxx) is coming up now. Some servers will be slow to come up, so we expect this may take at least a few hours.
The stats update has been turned off until this gets sorted out. We hope to turn it back on later tonight, but it may have to wait until tomorrow morning.
We have 4 different server rooms for redundancy and it looks like one of the room's network is down. All machines on the 171.64.65.XX net (VSPG*.stanford.edu) are unreachable right now. [Note that this server room is in the Stanford Computer Science building and it looks like the whole Stanford CS net is down (eg http://www.cs.stanford.edu/ is unresponsive).] We have notified the network admins and the server admins and they are working on this right now. It's Saturday, so they're on a reduced staff, but they're on it. This is something beyond my group's control (much like an electrical outage or a major natural disaster) and we have to wait for the Stanford CS networking gurus to do their magic.
The good news is that much of FAH is still up. You can see what parts are up vs down on our serverstat page (http://fah-web.stanford.edu/serverstat.html). However, the bad is is that with many of the key machines unreachable, the other machines are heavily loaded at the moment. Once the additional server room comes on line, then everything should ease up.
It's very unfortunate timing that this comes shortly after a rough set of weeks with our complete server upgrade, which lead to several machines being down in a rotation for updates. That lead to servers being hit hard too. Note that we have been working to improve client/server performance under such high loads. That development lead to server code changes and a new client binary available that helps uploads to servers when the servers are loaded. Please see the previous posts here for updates on those clients.
As of last Monday, the GPU2 projects (both NVIDIA and ATI) have been in production mode, which means that we've moved on from testing to direct research on them. The ATI cards have bee in production for a while (more than one month before) and we're starting to write our first peer reviewed scientific paper deriving from the results.
We're very excited about the GPU2 possibilities right now, since we are getting incredible production from GPU2 clients. Once the first paper has been formally accepted by peer review, I'll talk more about the results.