AWS Outage: Implications for Internet, Enterprise Cloud Customers
Yesterday's hours-long Amazon Web Services (AWS) outage provided a vivid illustration of how much large parts of the Internet depend on the cloud service. It also presented a puzzle for many users: because the AWS health dashboard itself depends on the cloud service, the status messages failed to indicate any signs of trouble throughout the outage.
Now resolved, the Feb. 28th outage of Amazon's S3 (Simple Storage Service) cloud-based object storage service caused many Web sites to be inaccessible or slow to load for several hours. Affected sites and services included Adobe, Coursera, Cracked, Imgur, Mailchimp, Medium, Quora, Slack, Trello as well as Internet health-tracking sites such as Downdetector and Is It Down Right Now.
S3 is an "object storage with a simple Web service interface to store and retrieve any amount of data from anywhere on the Web," according to Amazon. Used by more than 150,000 Web sites, S3 is designed for up to 99.99 percent availability. Yesterday's outage illustrated that one-in-ten-thousand chance of non-availability.
Problem at Virginia Data Center
While Amazon's cloud service health dashboard gave no indication of trouble, yesterday morning AWS noted on its Twitter account that S3 was "experiencing high error rates" that the company was working to recover. Because the dashboard wasn't showing alert color changes due to the S3 issue, Amazon also posted updates in a banner at the top of the Web page.
By 1:49 p.m. PST, all S3 services for object retrieval, listing, deletion and addition had been recovered and were back to working normally, Amazon said. The company said that the outage was traced to its US-EAST-1 gateway location, which is its data center in northern Virginia.
During the outage, Twitter became the place for various AWS customers and others to share information as well as to vent and post humorous items about the event. Adobe Customer Care, for example, posted a GIF of a puppy stampede to take customers' minds off the service outage, while another popular meme was a screenshot of Homer Simpson's dad with the headline, "Old Man Yells at Cloud."
Enterprises Need 'Balanced Approach'
In an analysis published today in Forbes, analyst Patrick Moorhead said yesterday's outage underscored a problem not with Amazon, but with enterprise users who don't fully consider the implications of moving key services into the cloud.
"This incident is an indictment, not of AWS or Amazon.com, but of business and IT decision makers," said Moorhead, who is founder, president and principal analyst at Moor Insights & Strategy. "Too often the decision to move IT services to the public cloud was driven by either cost or the thought that 'we need to get to the cloud to be competitive.' But not understanding the value that your IT can deliver today shortchanges the business."
While it makes sense for many enterprises to move some workloads into the public cloud, other services require a more balanced approach that might include use of private cloud as well as legacy systems, according to Moorhead.
The non-profit Institute for Local Self-Reliance made a similar observation in a report about Amazon published in November. "Amazon increasingly controls the underlying infrastructure of the economy," the report noted. "Its Amazon Web Services division provides the cloud computing backbone for much of the country, powering everyone from Netﬂix to the CIA."
In its most recent quarterly financial report issued in early February, Amazon said its AWS operating income for the 12 months ending Dec. 31 amounted to $3.1 billion, compared to $1.5 billion for same 12 months in 2015.
Posted: 2017-03-01 @ 2:25pm PT
An indictment of IT decision makers? I would plead guilty any day for the decision to use AWS. One 5 hour outage a year beats every large internally hosted infrastructure, not to mention all the automation and technology you have at your disposal with a few clicks.