Welcome!

@CloudExpo Authors: Elizabeth White, William Schmarzo, Lisa Calkins, Mamoon Yunus, Pat Romanski

Related Topics: @CloudExpo, Java IoT, Microservices Expo, Machine Learning

@CloudExpo: Blog Feed Post

Failure as a Service

The conversation stems from two power outages on May 4 and an extended power loss early on Saturday, May 8

A recent seven hour outage at Amazon Web Services on Saturday has renewed the discussion about cloud failures and whether the customer or the provider of the services should be held responsible. The conversation stems from two power outages on May 4 and an extended power loss early on Saturday, May 8. Saturday’s outage began at about 12:20 a.m. and lasted until 7:20 a.m., and affected a “set of racks,” according to Amazon, which said the bulk of customers in its U.S. East availability zone remained unaffected.

In one of the most direct posts, Amazon EBS sucks I just lost all my data, Dave Dopson said "they [AWS] promise redundancy, it is BS." Going on to point to AWS's statement " EBS volumes are designed to be highly available and reliable. Amazon EBS volume data is replicated across multiple servers in an Availability Zone to prevent the loss of data from the failure of any single component. The durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot. As an example, volumes that operate with 20 GB or lessof modified data since their most recent Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% –0.5%, where failure refers to a complete loss of the volume. This compares with commodity hard disks that will typically fail with an AFR of around 4%, making EBS volumes 10 times more reliable than typical commodity disk drives."

Like many new users to cloud computing, he assumed that he could just use the service and upon failure AWS's redundancy would automatically fix any problems, because they do (sort of) say that they prevent data loss. What Amazon actually states is a little different in that "the durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot" placing the responsibility on the customer. On one hand they say that they prevent data loss, but only if you use the AWS cloud correctly, otherwise you're SOL. The reality is that AWS for most users requires significant failure planning -- in this case the use of EBS's snap shot capability. The problem is that most [new] users have a hard time learning the rules of the road. A quick search for AWS failure planning on the AWS forums resulted in little additional insights and really appears to mostly about trial and error.

In the case of hardware failures Amazon expects you to design your architecture correctly for these kinds of events by use of redundancy, for example use mutliple VM's etc. They expect a certain level of knowledge of both system administration as well as how AWS itself has been designed to be used. Newbies need not apply or should use at you're own risk. Which isn't all that clear to a new user, who hears that cloud computing is safe and the answer to all your problems. Which I admit should be a red flag in itself. The problem is two fold, an over hyped technology and unclear failure models which combine to create a perfect storm. You need the late adopters for the real revenue opportunities, but these same late adopters require a different more gentle kind of cloud service, probably one a little more platform than infrastructure focused. As IaaS matures it is becoming obvious that the "Über Geek" developers who first adopted the service is not where the long tail revenue opportunities are. To make IaaS viable to a broader market, AWS and other IaaS vendors need to mature their platforms for a lesser type of user. (A lower or least common denominator) One who is smart enough to be dangerous, otherwise they're doomed to be limited to the only for experts only segment.

The bigger question is should a cloud user have to worry about hardware failures or should these types of failures be the sole responsiblity of the service provider? My opinion is deploying to the cloud should reduce complexity, not increase it. The user should be responsible for what they have access to, so in the case of AWS, they should be responsible for failures that are brought about by the applications and related components they build and deploy, not by the hardware. If hardware fails (which it will) this should be the responsibility of those who manage and provide it. Making things worst is promising to be highly available, reliable and redundant, but with the fine print of "if you are smart enough to use all our services in the proper way" which isn't fair. If EBS is automatically replicated why did Dave lose all his data?

In a optimal cloud environment any single server failures shouldn't matter. But it appears at AWS it does.

More Stories By Reuven Cohen

An instigator, part time provocateur, bootstrapper, amateur cloud lexicographer, and purveyor of random thoughts, 140 characters at a time.

Reuven is an early innovator in the cloud computing space as the founder of Enomaly in 2004 (Acquired by Virtustream in February 2012). Enomaly was among the first to develop a self service infrastructure as a service (IaaS) platform (ECP) circa 2005. As well as SpotCloud (2011) the first commodity style cloud computing Spot Market.

Reuven is also the co-creator of CloudCamp (100+ Cities around the Globe) CloudCamp is an unconference where early adopters of Cloud Computing technologies exchange ideas and is the largest of the ‘barcamp’ style of events.

@CloudExpo Stories
WebRTC is great technology to build your own communication tools. It will be even more exciting experience it with advanced devices, such as a 360 Camera, 360 microphone, and a depth sensor camera. In his session at @ThingsExpo, Masashi Ganeko, a manager at INFOCOM Corporation, will introduce two experimental projects from his team and what they learned from them. "Shotoku Tamago" uses the robot audition software HARK to track speakers in 360 video of a remote party. "Virtual Teleport" uses a...
While some vendors scramble to create and sell you a fancy solution for monitoring your spanking new Amazon Lambdas, hear how you can do it on the cheap using just built-in Java APIs yourself. By exploiting a little-known fact that Lambdas aren’t exactly single-threaded, you can effectively identify hot spots in your serverless code. In his session at @DevOpsSummit at 21st Cloud Expo, Dave Martin, Product owner at CA Technologies, will give a live demonstration and code walkthrough, showing how ...
Translating agile methodology into real-world best practices within the modern software factory has driven widespread DevOps adoption, yet much work remains to expand workflows and tooling across the enterprise. As models evolve from pockets of experimentation into wholescale organizational reinvention, practitioners find themselves challenged to incorporate the culture and architecture necessary to support DevOps at scale.
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
SYS-CON Events announced today that CAST Software will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CAST was founded more than 25 years ago to make the invisible visible. Built around the idea that even the best analytics on the market still leave blind spots for technical teams looking to deliver better software and prevent outages, CAST provides the software intelligence that matter ...
As more and more companies are making the shift from on-premises to public cloud, the standard approach to DevOps is evolving. From encryption, compliance and regulations like GDPR, security in the cloud has become a hot topic. Many DevOps-focused companies have hired dedicated staff to fulfill these requirements, often creating further siloes, complexity and cost. This session aims to highlight existing DevOps cultural approaches, tooling and how security can be wrapped in every facet of the bu...
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the applic...
yperConvergence came to market with the objective of being simple, flexible and to help drive down operating expenses. It reduced the footprint by bundling the compute/storage/network into one box. This brought a new set of challenges as the HyperConverged vendors are very focused on their own proprietary building blocks. If you want to scale in a certain way, let’s say you identified a need for more storage and want to add a device that is not sold by the HyperConverged vendor, forget about it....
SYS-CON Events announced today that Pulzze Systems will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Pulzze Systems Inc, provides the software product "The Interactor" that uniquely simplifies building IoT, Web and Smart Enterprise Solutions. It is a Silicon Valley startup funded by US government agencies, NSF and DHS to bring innovative solutions to market.
With Cloud Foundry you can easily deploy and use apps utilizing websocket technology, but not everybody realizes that scaling them out is not that trivial. In his session at 21st Cloud Expo, Roman Swoszowski, CTO and VP, Cloud Foundry Services, at Grape Up, will show you an example of how to deal with this issue. He will demonstrate a cloud-native Spring Boot app running in Cloud Foundry and communicating with clients over websocket protocol that can be easily scaled horizontally and coordinate...
SYS-CON Events announced today that Elastifile will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Elastifile Cloud File System (ECFS) is software-defined data infrastructure designed for seamless and efficient management of dynamic workloads across heterogeneous environments. Elastifile provides the architecture needed to optimize your hybrid cloud environment, by facilitating efficient...
For financial firms, the cloud is going to increasingly become a crucial part of dealing with customers over the next five years and beyond, particularly with the growing use and acceptance of virtual currencies. There are new data storage paradigms on the horizon that will deliver secure solutions for storing and moving sensitive financial data around the world without touching terrestrial networks. In his session at 20th Cloud Expo, Cliff Beek, President of Cloud Constellation Corporation, d...
Businesses and business units of all sizes can benefit from cloud computing, but many don't want the cost, performance and security concerns of public cloud nor the complexity of building their own private clouds. Today, some cloud vendors are using artificial intelligence (AI) to simplify cloud deployment and management. In his session at 20th Cloud Expo, Ajay Gulati, Co-founder and CEO of ZeroStack, discussed how AI can simplify cloud operations. He covered the following topics: why cloud mana...
SYS-CON Events announced today that Golden Gate University will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Since 1901, non-profit Golden Gate University (GGU) has been helping adults achieve their professional goals by providing high quality, practice-based undergraduate and graduate educational programs in law, taxation, business and related professions. Many of its courses are taug...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
Vulnerability management is vital for large companies that need to secure containers across thousands of hosts, but many struggle to understand how exposed they are when they discover a new high security vulnerability. In his session at 21st Cloud Expo, John Morello, CTO of Twistlock, will address this pressing concern by introducing the concept of the “Vulnerability Risk Tree API,” which brings all the data together in a simple REST endpoint, allowing companies to easily grasp the severity of t...
Recently, WebRTC has a lot of eyes from market. The use cases of WebRTC are expanding - video chat, online education, online health care etc. Not only for human-to-human communication, but also IoT use cases such as machine to human use cases can be seen recently. One of the typical use-case is remote camera monitoring. With WebRTC, people can have interoperability and flexibility for deploying monitoring service. However, the benefit of WebRTC for IoT is not only its convenience and interopera...
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
In his session at 20th Cloud Expo, Scott Davis, CTO of Embotics, discussed how automation can provide the dynamic management required to cost-effectively deliver microservices and container solutions at scale. He also discussed how flexible automation is the key to effectively bridging and seamlessly coordinating both IT and developer needs for component orchestration across disparate clouds – an increasingly important requirement at today’s multi-cloud enterprise.
WebRTC is the future of browser-to-browser communications, and continues to make inroads into the traditional, difficult, plug-in web communications world. The 6th WebRTC Summit continues our tradition of delivering the latest and greatest presentations within the world of WebRTC. Topics include voice calling, video chat, P2P file sharing, and use cases that have already leveraged the power and convenience of WebRTC.