Welcome!

Cloud Expo Authors: Sue Poremba, Pat Romanski, Elizabeth White, Patrick Burke, Jeremy Geelan

Related Topics: Cloud Expo, Java, SOA & WOA, AJAX & REA

Cloud Expo: Blog Feed Post

Failure as a Service

The conversation stems from two power outages on May 4 and an extended power loss early on Saturday, May 8

A recent seven hour outage at Amazon Web Services on Saturday has renewed the discussion about cloud failures and whether the customer or the provider of the services should be held responsible. The conversation stems from two power outages on May 4 and an extended power loss early on Saturday, May 8. Saturday’s outage began at about 12:20 a.m. and lasted until 7:20 a.m., and affected a “set of racks,” according to Amazon, which said the bulk of customers in its U.S. East availability zone remained unaffected.

In one of the most direct posts, Amazon EBS sucks I just lost all my data, Dave Dopson said "they [AWS] promise redundancy, it is BS." Going on to point to AWS's statement " EBS volumes are designed to be highly available and reliable. Amazon EBS volume data is replicated across multiple servers in an Availability Zone to prevent the loss of data from the failure of any single component. The durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot. As an example, volumes that operate with 20 GB or lessof modified data since their most recent Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% –0.5%, where failure refers to a complete loss of the volume. This compares with commodity hard disks that will typically fail with an AFR of around 4%, making EBS volumes 10 times more reliable than typical commodity disk drives."

Like many new users to cloud computing, he assumed that he could just use the service and upon failure AWS's redundancy would automatically fix any problems, because they do (sort of) say that they prevent data loss. What Amazon actually states is a little different in that "the durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot" placing the responsibility on the customer. On one hand they say that they prevent data loss, but only if you use the AWS cloud correctly, otherwise you're SOL. The reality is that AWS for most users requires significant failure planning -- in this case the use of EBS's snap shot capability. The problem is that most [new] users have a hard time learning the rules of the road. A quick search for AWS failure planning on the AWS forums resulted in little additional insights and really appears to mostly about trial and error.

In the case of hardware failures Amazon expects you to design your architecture correctly for these kinds of events by use of redundancy, for example use mutliple VM's etc. They expect a certain level of knowledge of both system administration as well as how AWS itself has been designed to be used. Newbies need not apply or should use at you're own risk. Which isn't all that clear to a new user, who hears that cloud computing is safe and the answer to all your problems. Which I admit should be a red flag in itself. The problem is two fold, an over hyped technology and unclear failure models which combine to create a perfect storm. You need the late adopters for the real revenue opportunities, but these same late adopters require a different more gentle kind of cloud service, probably one a little more platform than infrastructure focused. As IaaS matures it is becoming obvious that the "Über Geek" developers who first adopted the service is not where the long tail revenue opportunities are. To make IaaS viable to a broader market, AWS and other IaaS vendors need to mature their platforms for a lesser type of user. (A lower or least common denominator) One who is smart enough to be dangerous, otherwise they're doomed to be limited to the only for experts only segment.

The bigger question is should a cloud user have to worry about hardware failures or should these types of failures be the sole responsiblity of the service provider? My opinion is deploying to the cloud should reduce complexity, not increase it. The user should be responsible for what they have access to, so in the case of AWS, they should be responsible for failures that are brought about by the applications and related components they build and deploy, not by the hardware. If hardware fails (which it will) this should be the responsibility of those who manage and provide it. Making things worst is promising to be highly available, reliable and redundant, but with the fine print of "if you are smart enough to use all our services in the proper way" which isn't fair. If EBS is automatically replicated why did Dave lose all his data?

In a optimal cloud environment any single server failures shouldn't matter. But it appears at AWS it does.

More Stories By Reuven Cohen

An instigator, part time provocateur, bootstrapper, amateur cloud lexicographer, and purveyor of random thoughts, 140 characters at a time.

Reuven is an early innovator in the cloud computing space as the founder of Enomaly in 2004 (Acquired by Virtustream in February 2012). Enomaly was among the first to develop a self service infrastructure as a service (IaaS) platform (ECP) circa 2005. As well as SpotCloud (2011) the first commodity style cloud computing Spot Market.

Reuven is also the co-creator of CloudCamp (100+ Cities around the Globe) CloudCamp is an unconference where early adopters of Cloud Computing technologies exchange ideas and is the largest of the ‘barcamp’ style of events.

Cloud Expo Breaking News
Nearly every enterprise is evaluating cloud computing solutions either today or in the near term. Many have already made the leap, and many more are getting close to putting that first toe in the water. But there are key considerations that should be made, questions to be asked, and designs to consider before you can feel secure with your provider. In his session at the 10th International Cloud Expo, David Gulick, Product Manager, Hosting Product Management at Savvis, will help give you food f...
With Cloud Expo 2012 New York (10th Cloud Expo) now under four weeks away, what better time to introduce you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and strategy sessions for you dealing with every nook and cranny of Cloud Computing, but what of those who are presenting? Who are they, where do they work, what else have they written and/or said about the Cloud that is t...
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in high-performance, high-efficiency server technology and green computing, will exhibit at SYS-CON's 10th International Cloud Expo, which will take place on June 11–14, 2012, at the Javits Center in New York City, New York. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions for Embedded Systems, E...
SYS-CON Events announced today that ScaleMP, a leading provider of virtualization solutions for high-end computing, will exhibit at SYS-CON's 10th International Cloud Expo, which will take place on June 11–14, 2012, at the Javits Center in New York City, New York. ScaleMP is the leader in virtualization for high-end computing, providing maximum performance and lower total cost of ownership (TCO). The innovative Versatile SMP (vSMP) architecture aggregates multiple independent systems into a sin...
Come learn real-world examples where cloud and mobile are changing the way business works and the impact they're having on efficiency and productivity. In his session at the 10th International Cloud Expo, Rodrigo Coutinho Senior Product Marketing Manager at OutSystems, will look at how mobile and the cloud are interwoven and the wave of change these two 2012 megatrends will bring to your organization. He will also provide a roadmap to assure you can navigate this sea change for business succes...
Enterprise IT organizations want to deploy a virtualized data center fabric that will provide the foundation for agile private cloud computing. Getting there does not have to be difficult, but it does require a new approach to data center infrastructure design – an approach that is non-disruptive, vendor-agnostic, and very adaptable to changing business requirements. In his session at the 10th International Cloud Expo, Bruce Fingles, Chief Information Officer and VP of Product Quality at Xsigo...
With Cloud Expo 2012 New York (10th Cloud Expo) now under four weeks away, what better time to introduce you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference...
How can businesses harness the power of APIs to reach new customers and markets? In his session at the 10th International Cloud Expo, Alistair Farquharson, CTO at SOA Software, will walk the audience through the growth and evolution of the API, why effective API management is important, and how the game changes when companies expose business applications to the outside world. He will also discuss: A brief history of the API How to use APIs to make money, save money, build brand "Appificatio...
With Cloud Expo 2012 New York (10th Cloud Expo) now under four weeks away, what better time to introduce you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference...
With Cloud Expo 2012 New York (10th Cloud Expo) now under four weeks away, what better time to introduce you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what else have ...