Amazon Cloud Services Went Offline: How Could This Happen?

On the last day of February 2017, the internet almost ground to a halt. The reason was a 4-hour outage at Amazon’s computing division, Amazon Web Services (AWS).

On the last day of February 2017, the internet almost ground to a halt. The reason was a 4-hour outage at Amazon’s computing division, Amazon Web Services (AWS) caused hundreds of thousands of websites throughout the US to go dark. Wait, Amazon, the computer online shopping site was interrupted? How can that be?

Amazon AWS Down

Amazon, the largest retailer in the Western World, began AWS as a side-business. Today it is among the largest web services providers and accounts for about 8% of Amazon’s revenue – in other words, AWS is a money-maker.

What Caused AWS to Go Down?

Amazon Web Services is a huge provider of hosting websites like:

  • Spotify
  • Buzzfeed
  • Pinterest
  • Netflix
  • More

Other companies large and small lost their service too. Estimates of affected sites are in the hundreds of thousands. AWS is a provider of web services and cloud storage for companies that choose not to heavily invest in computer hardware – this frees companies of spending capital on construction and outfitting of their own server farms.

The affected part of the AWS system was its S3 system (Simple Storage Service) that went offline that February afternoon. Though not all AWS clients were affected, many experienced slowdowns or simply became non-responsive on the site location.

Dave Bartoletti, a cloud analyst with Forrester said:

“This is a pretty big outage, AWS had not had a lot of outages and when they happen, they’re famous. People still talk about the one in September of 2015 that lasted five hours.”

The outage seems to have started at a few minutes past 12:30 PM and operations were completely restored some four hours later.

Another Forrester Cloud Analyst, Lydia Leong commented about the outage’s cause:

“The most common causes of this type of outage are software related, either a bug in the code or human error. Right now, we don’t know what it was.”

Amazon later explained that it was human error, not an attack on AWS or failure of hardware of software. Amazon explained that an authorized Amazon employee was debugging an issue with the S3 payment system and inadvertently entered a wrong command shutting down more than the few servers needing attention. As a result, Amazon had to restart all the affected servers and service was completely recovered by 4:40 PM.

The enormity of the outage serves as a reminder to companies of the associated with depending on just a few companies for cloud computing. Other providers of similar services include:

  • IBM
  • Google Cloud Platform
  • Microsoft’s Azure Service

Leong commented on the impact of the outage:

“More than anything else, S3 customers need to be able to get at their data, because often S3 is used to store images. So, no S3, no nice picture or fancy logo on your website.”

Most Web Sites Did Not Fail Completely

Most modern websites pull data from more than one cloud database, so while an image may not be available, other information still appears when a database like S3 goes down.

CAT-TEC in The GTA can be reached at (416) 840-6560 or {email} keeps you up-to-date on all the news, trends, tips, and tricks on computer technology.

Client Success

The Michaud Group

“I have an offsite assistant, and all of my files (drawer-upon-drawer of them) are now  available. The ability to load documents from the offsite location, and have them [available] for me exactly when I need them is the great benefit of the SaveYourData software,”

The Michaud Group