We've been looking over the server code to see what's going on and I think we have some news. What's happening is that the server is being a bit too generous in terms of when it times out connections, filling up all of the available server threads. For now, we've bumped up the # of threads (not unlike adding more cashiers at the grocery isle, so that a few slow people don't slow down everyone) from 200 to 500. That should help a lot in the short term.
On Monday, we'll make some code modifications to make a more complete fix to this. It may take a little while to get this tested and implemented, but I expect that this should be in by Monday afternoon. That should greatly ease what we're seeing. (PS Note Monday = Monday pacific time)
In some ways, this has crept up on us, as nothing has changed (all our servers are up and running) -- it's just that the extended down time built up a large # of WU's to return and the server is still trying to catch up. We've turned off new assigns from this machine to help it catch up (no new WU's will mean it spends all it's time receiving WU's).
With this change, the threads tweak, and the code update coming later today we should have some more news (hopefully good news) later on Monday.
Hope that deadline problems from WU's does not occurs. I have several complains from my team that they can´t send their works to this server.
Regards, Fernando.
Posted by: Fernando Célio | July 28, 2008 at 08:20 AM