Welcome!

@CloudExpo Authors: Pat Romanski, Elizabeth White, Yeshim Deniz, Liz McMillan, Aruna Ravichandran

Related Topics: @CloudExpo, Java IoT, Microservices Expo, Machine Learning

@CloudExpo: Blog Feed Post

Failure as a Service

The conversation stems from two power outages on May 4 and an extended power loss early on Saturday, May 8

A recent seven hour outage at Amazon Web Services on Saturday has renewed the discussion about cloud failures and whether the customer or the provider of the services should be held responsible. The conversation stems from two power outages on May 4 and an extended power loss early on Saturday, May 8. Saturday’s outage began at about 12:20 a.m. and lasted until 7:20 a.m., and affected a “set of racks,” according to Amazon, which said the bulk of customers in its U.S. East availability zone remained unaffected.

In one of the most direct posts, Amazon EBS sucks I just lost all my data, Dave Dopson said "they [AWS] promise redundancy, it is BS." Going on to point to AWS's statement " EBS volumes are designed to be highly available and reliable. Amazon EBS volume data is replicated across multiple servers in an Availability Zone to prevent the loss of data from the failure of any single component. The durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot. As an example, volumes that operate with 20 GB or lessof modified data since their most recent Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% –0.5%, where failure refers to a complete loss of the volume. This compares with commodity hard disks that will typically fail with an AFR of around 4%, making EBS volumes 10 times more reliable than typical commodity disk drives."

Like many new users to cloud computing, he assumed that he could just use the service and upon failure AWS's redundancy would automatically fix any problems, because they do (sort of) say that they prevent data loss. What Amazon actually states is a little different in that "the durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot" placing the responsibility on the customer. On one hand they say that they prevent data loss, but only if you use the AWS cloud correctly, otherwise you're SOL. The reality is that AWS for most users requires significant failure planning -- in this case the use of EBS's snap shot capability. The problem is that most [new] users have a hard time learning the rules of the road. A quick search for AWS failure planning on the AWS forums resulted in little additional insights and really appears to mostly about trial and error.

In the case of hardware failures Amazon expects you to design your architecture correctly for these kinds of events by use of redundancy, for example use mutliple VM's etc. They expect a certain level of knowledge of both system administration as well as how AWS itself has been designed to be used. Newbies need not apply or should use at you're own risk. Which isn't all that clear to a new user, who hears that cloud computing is safe and the answer to all your problems. Which I admit should be a red flag in itself. The problem is two fold, an over hyped technology and unclear failure models which combine to create a perfect storm. You need the late adopters for the real revenue opportunities, but these same late adopters require a different more gentle kind of cloud service, probably one a little more platform than infrastructure focused. As IaaS matures it is becoming obvious that the "Über Geek" developers who first adopted the service is not where the long tail revenue opportunities are. To make IaaS viable to a broader market, AWS and other IaaS vendors need to mature their platforms for a lesser type of user. (A lower or least common denominator) One who is smart enough to be dangerous, otherwise they're doomed to be limited to the only for experts only segment.

The bigger question is should a cloud user have to worry about hardware failures or should these types of failures be the sole responsiblity of the service provider? My opinion is deploying to the cloud should reduce complexity, not increase it. The user should be responsible for what they have access to, so in the case of AWS, they should be responsible for failures that are brought about by the applications and related components they build and deploy, not by the hardware. If hardware fails (which it will) this should be the responsibility of those who manage and provide it. Making things worst is promising to be highly available, reliable and redundant, but with the fine print of "if you are smart enough to use all our services in the proper way" which isn't fair. If EBS is automatically replicated why did Dave lose all his data?

In a optimal cloud environment any single server failures shouldn't matter. But it appears at AWS it does.

More Stories By Reuven Cohen

An instigator, part time provocateur, bootstrapper, amateur cloud lexicographer, and purveyor of random thoughts, 140 characters at a time.

Reuven is an early innovator in the cloud computing space as the founder of Enomaly in 2004 (Acquired by Virtustream in February 2012). Enomaly was among the first to develop a self service infrastructure as a service (IaaS) platform (ECP) circa 2005. As well as SpotCloud (2011) the first commodity style cloud computing Spot Market.

Reuven is also the co-creator of CloudCamp (100+ Cities around the Globe) CloudCamp is an unconference where early adopters of Cloud Computing technologies exchange ideas and is the largest of the ‘barcamp’ style of events.

@CloudExpo Stories
SYS-CON Events announced today that N3N will exhibit at SYS-CON's @ThingsExpo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. N3N’s solutions increase the effectiveness of operations and control centers, increase the value of IoT investments, and facilitate real-time operational decision making. N3N enables operations teams with a four dimensional digital “big board” that consolidates real-time live video feeds alongside IoT sensor data a...
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, will provide a fun and simple way to introduce Machine Leaning to anyone and everyone. Together we will solve a machine learning problem and find an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intellige...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
SYS-CON Events announced today that Avere Systems, a leading provider of enterprise storage for the hybrid cloud, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Avere delivers a more modern architectural approach to storage that doesn't require the overprovisioning of storage capacity to achieve performance, overspending on expensive storage media for inactive data or the overbui...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
Data scientists must access high-performance computing resources across a wide-area network. To achieve cloud-based HPC visualization, researchers must transfer datasets and visualization results efficiently. HPC clusters now compute GPU-accelerated visualization in the cloud cluster. To efficiently display results remotely, a high-performance, low-latency protocol transfers the display from the cluster to a remote desktop. Further, tools to easily mount remote datasets and efficiently transfer...
Digital transformation is changing the face of business. The IDC predicts that enterprises will commit to a massive new scale of digital transformation, to stake out leadership positions in the "digital transformation economy." Accordingly, attendees at the upcoming Cloud Expo | @ThingsExpo at the Santa Clara Convention Center in Santa Clara, CA, Oct 31-Nov 2, will find fresh new content in a new track called Enterprise Cloud & Digital Transformation.
SYS-CON Events announced today that mruby Forum will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. mruby is the lightweight implementation of the Ruby language. We introduce mruby and the mruby IoT framework that enhances development productivity. For more information, visit http://forum.mruby.org/.
Though cloud is the future of enterprise computing, a smooth transition of legacy applications and systems is critical for seamless business operations. IT professionals are eager to start leveraging the cost, scale and other benefits of cloud, but with massive investments already in place in existing infrastructure and a number of compliance and resource hurdles, it can be challenging to move to a cloud-based infrastructure.
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp emp...
The dynamic nature of the cloud means that change is a constant when it comes to modern cloud-based infrastructure. Delivering modern applications to end users, therefore, is a constantly shifting challenge. Delivery automation helps IT Ops teams ensure that apps are providing an optimal end user experience over hybrid-cloud and multi-cloud environments, no matter what the current state of the infrastructure is. To employ a delivery automation strategy that reflects your business rules, making r...
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software. They hope to capture value from emerging technologies such as IoT, SDN, and AI. Ultimately, irrespective of the vertical, it is about deriving value from independent software applications participating in an ecosystem as one comprehensive solution. In his session at @ThingsExpo, Kausik Sridhar, founder and CTO of Pulzze Systems, will discuss how given the magnitude of today's applicati...
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
Amazon is pursuing new markets and disrupting industries at an incredible pace. Almost every industry seems to be in its crosshairs. Companies and industries that once thought they were safe are now worried about being “Amazoned.”. The new watch word should be “Be afraid. Be very afraid.” In his session 21st Cloud Expo, Chris Kocher, a co-founder of Grey Heron, will address questions such as: What new areas is Amazon disrupting? How are they doing this? Where are they likely to go? What are th...
SYS-CON Events announced today that SkyScale will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SkyScale is a world-class provider of cloud-based, ultra-fast multi-GPU hardware platforms for lease to customers desiring the fastest performance available as a service anywhere in the world. SkyScale builds, configures, and manages dedicated systems strategically located in maximum-security...
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant th...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, will go over the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, applicatio...
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. Thi...
Microsoft Azure Container Services can be used for container deployment in a variety of ways including support for Orchestrators like Kubernetes, Docker Swarm and Mesos. However, the abstraction for app development that support application self-healing, scaling and so on may not be at the right level. Helm and Draft makes this a lot easier. In this primarily demo-driven session at @DevOpsSummit at 21st Cloud Expo, Raghavan "Rags" Srinivas, a Cloud Solutions Architect/Evangelist at Microsoft, wi...
Containers are rapidly finding their way into enterprise data centers, but change is difficult. How do enterprises transform their architecture with technologies like containers without losing the reliable components of their current solutions? In his session at @DevOpsSummit at 21st Cloud Expo, Tony Campbell, Director, Educational Services at CoreOS, will explore the challenges organizations are facing today as they move to containers and go over how Kubernetes applications can deploy with lega...