

News Archive
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
Facebook Explains Outage
September 24, 2010, 6:16 amFacebook was down for over 2.5 hours for some users, according to a post from the company. A post in Facebook's engineering notes says:
The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition. An automated system for verifying configuration values ended up causing much more damage than it fixed.
The intent of the automated system is to check for configuration values that are invalid in the cache and replace them with updated values from the persistent store. This works well for a transient problem with the cache, but it doesn’t work when the persistent store is invalid.
You can see more of the technical details here. Facebook has turned off the system that attempts to correct configuration values, and is exploring new designs for it.
I attended a screening of the movie The Social Network last night, and Mark Zuckerberg's character stressed how much downtime would hurt the reputation of the site, as he was getting it launched. I thought that was kind of funny, considering the timing.




