Amazon to customers: Oops, our bad
When it comes to curtailing customer frustration over brown-outs or longer-lasting outages, companies can't overdo it when it comes to communication (something Sony is now learning to its discomfort.)
So it was on Friday that Amazon publicly fell on its sword, asking its customers' forgiveness for a data-center outage that impacted several big websites. Amazon also offered Web services customers were were affected a 10-day credit. No details yet on how much the credit will cost it.
Amazon rents out computer time by the hour through its web services operations. Although that business contributes a small fraction of the company's annual sales, Amazon has big expectations as more businesses choose to use so-called cloud computing services.
Sony encrypted credit card data, not user account info
Amazon scrambles to restore service after outage
Amazon blamed the outage on human error, saying that an automated error-recovery mechanism subsequently went out of control, and that many computers became "stuck" in recovery mode. The outage affected Amazon's data center near Dulles Airport, outside Washington eight days ago.
The service is set up in a way that's supposed to provide redundancy, by letting computers in a different "availability zone" take over when one fails. Amazon said that customers that were properly set up to run their computing tasks over multiple zones were largely unaffected, but that the error made it difficult to switch zones on the fly. It's making changes to prevent the error from recurring.
In a statement, Amazon noted:
"In addition to the technical insights and improvements that will result from this event, we also identified improvements that need to be made in our customer communications....Initially, our primary focus was on thinking through how to solve the operational problems for customers rather than on identifying root causes. We felt that that focusing our efforts on a solution and not the problem was the right thing to do for our customers, and that it helped us to return the services and our customers back to health more quickly. We updated customers when we had new information that we felt confident was accurate and refrained from speculating, knowing that once we had returned the services back to health that we would quickly transition to the data collection and analysis stage that would drive this post mortem."
Amazon said it planned to issue more regular updates when problems arise and would expand its developer support team to get information out to customers more rapidly.