Welcome!

@CloudExpo Authors: Liz McMillan, John Worthington, Stackify Blog, Elizabeth White, Jason Bloomberg

Related Topics: @CloudExpo, Java IoT, Microservices Expo, Machine Learning

@CloudExpo: Blog Feed Post

Failure as a Service

The conversation stems from two power outages on May 4 and an extended power loss early on Saturday, May 8

A recent seven hour outage at Amazon Web Services on Saturday has renewed the discussion about cloud failures and whether the customer or the provider of the services should be held responsible. The conversation stems from two power outages on May 4 and an extended power loss early on Saturday, May 8. Saturday’s outage began at about 12:20 a.m. and lasted until 7:20 a.m., and affected a “set of racks,” according to Amazon, which said the bulk of customers in its U.S. East availability zone remained unaffected.

In one of the most direct posts, Amazon EBS sucks I just lost all my data, Dave Dopson said "they [AWS] promise redundancy, it is BS." Going on to point to AWS's statement " EBS volumes are designed to be highly available and reliable. Amazon EBS volume data is replicated across multiple servers in an Availability Zone to prevent the loss of data from the failure of any single component. The durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot. As an example, volumes that operate with 20 GB or lessof modified data since their most recent Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% –0.5%, where failure refers to a complete loss of the volume. This compares with commodity hard disks that will typically fail with an AFR of around 4%, making EBS volumes 10 times more reliable than typical commodity disk drives."

Like many new users to cloud computing, he assumed that he could just use the service and upon failure AWS's redundancy would automatically fix any problems, because they do (sort of) say that they prevent data loss. What Amazon actually states is a little different in that "the durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot" placing the responsibility on the customer. On one hand they say that they prevent data loss, but only if you use the AWS cloud correctly, otherwise you're SOL. The reality is that AWS for most users requires significant failure planning -- in this case the use of EBS's snap shot capability. The problem is that most [new] users have a hard time learning the rules of the road. A quick search for AWS failure planning on the AWS forums resulted in little additional insights and really appears to mostly about trial and error.

In the case of hardware failures Amazon expects you to design your architecture correctly for these kinds of events by use of redundancy, for example use mutliple VM's etc. They expect a certain level of knowledge of both system administration as well as how AWS itself has been designed to be used. Newbies need not apply or should use at you're own risk. Which isn't all that clear to a new user, who hears that cloud computing is safe and the answer to all your problems. Which I admit should be a red flag in itself. The problem is two fold, an over hyped technology and unclear failure models which combine to create a perfect storm. You need the late adopters for the real revenue opportunities, but these same late adopters require a different more gentle kind of cloud service, probably one a little more platform than infrastructure focused. As IaaS matures it is becoming obvious that the "Über Geek" developers who first adopted the service is not where the long tail revenue opportunities are. To make IaaS viable to a broader market, AWS and other IaaS vendors need to mature their platforms for a lesser type of user. (A lower or least common denominator) One who is smart enough to be dangerous, otherwise they're doomed to be limited to the only for experts only segment.

The bigger question is should a cloud user have to worry about hardware failures or should these types of failures be the sole responsiblity of the service provider? My opinion is deploying to the cloud should reduce complexity, not increase it. The user should be responsible for what they have access to, so in the case of AWS, they should be responsible for failures that are brought about by the applications and related components they build and deploy, not by the hardware. If hardware fails (which it will) this should be the responsibility of those who manage and provide it. Making things worst is promising to be highly available, reliable and redundant, but with the fine print of "if you are smart enough to use all our services in the proper way" which isn't fair. If EBS is automatically replicated why did Dave lose all his data?

In a optimal cloud environment any single server failures shouldn't matter. But it appears at AWS it does.

More Stories By Reuven Cohen

An instigator, part time provocateur, bootstrapper, amateur cloud lexicographer, and purveyor of random thoughts, 140 characters at a time.

Reuven is an early innovator in the cloud computing space as the founder of Enomaly in 2004 (Acquired by Virtustream in February 2012). Enomaly was among the first to develop a self service infrastructure as a service (IaaS) platform (ECP) circa 2005. As well as SpotCloud (2011) the first commodity style cloud computing Spot Market.

Reuven is also the co-creator of CloudCamp (100+ Cities around the Globe) CloudCamp is an unconference where early adopters of Cloud Computing technologies exchange ideas and is the largest of the ‘barcamp’ style of events.

@CloudExpo Stories
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
In his session at 21st Cloud Expo, James Henry, Co-CEO/CTO of Calgary Scientific Inc., introduced you to the challenges, solutions and benefits of training AI systems to solve visual problems with an emphasis on improving AIs with continuous training in the field. He explored applications in several industries and discussed technologies that allow the deployment of advanced visualization solutions to the cloud.
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"We're focused on how to get some of the attributes that you would expect from an Amazon, Azure, Google, and doing that on-prem. We believe today that you can actually get those types of things done with certain architectures available in the market today," explained Steve Conner, VP of Sales at Cloudistics, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, provided a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to oper...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
"We're developing a software that is based on the cloud environment and we are providing those services to corporations and the general public," explained Seungmin Kim, CEO/CTO of SM Systems Inc., in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
WebRTC is great technology to build your own communication tools. It will be even more exciting experience it with advanced devices, such as a 360 Camera, 360 microphone, and a depth sensor camera. In his session at @ThingsExpo, Masashi Ganeko, a manager at INFOCOM Corporation, introduced two experimental projects from his team and what they learned from them. "Shotoku Tamago" uses the robot audition software HARK to track speakers in 360 video of a remote party. "Virtual Teleport" uses a multip...
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
DevOps promotes continuous improvement through a culture of collaboration. But in real terms, how do you: Integrate activities across diverse teams and services? Make objective decisions with system-wide visibility? Use feedback loops to enable learning and improvement? With technology insights and real-world examples, in his general session at @DevOpsSummit, at 21st Cloud Expo, Andi Mann, Chief Technology Advocate at Splunk, explored how leading organizations use data-driven DevOps to close th...
Enterprises are moving to the cloud faster than most of us in security expected. CIOs are going from 0 to 100 in cloud adoption and leaving security teams in the dust. Once cloud is part of an enterprise stack, it’s unclear who has responsibility for the protection of applications, services, and data. When cloud breaches occur, whether active compromise or a publicly accessible database, the blame must fall on both service providers and users. In his session at 21st Cloud Expo, Ben Johnson, C...
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
"The reason Tier 1 companies are coming to us is we're able to narrow the gap where custom applications need to be built. They provide a lot of services, like IBM has Watson, and they provide a lot of hardware but how do you bring it all together? Bringing it all together they have to build custom applications and that's the niche that we are able to help them with," explained Peter Jung, Product Leader at Pulzze Systems Inc., in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2,...
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, discussed how data centers of the future will be managed, how the p...
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Data scientists must access high-performance computing resources across a wide-area network. To achieve cloud-based HPC visualization, researchers must transfer datasets and visualization results efficiently. HPC clusters now compute GPU-accelerated visualization in the cloud cluster. To efficiently display results remotely, a high-performance, low-latency protocol transfers the display from the cluster to a remote desktop. Further, tools to easily mount remote datasets and efficiently transfer...