|By Eric Novikoff||
|August 27, 2008 02:40 AM EDT||
Eric Novikoff's Blog
IT managers and pundits speak of the reliability of a system in "nines." Two nines is the same as 99%, which comes to (100%-99%)*365 or 3.65 days of downtime per year, which is typical for non-redundant hardware if you include the time to reload the operating system and restore backups (if you have them) after a failure. Three nines is about 8 hours of downtime, four nines is about 52 minutes and the holy grail of 5 nines is 7 minutes.
From a users' point of view, downtime is downtime, but for a provider/vendor/web site manager, downtime is divided into planned and unplanned. Cloud computing can offer some benefits for planned downtime, but the place that it can have the largest effect on a business is in reducing unplanned downtime.
Planned downtime is usually the result of having to do some sort of software maintenance or release process, which is usually outside the domain of the cloud vendor, unless that vendor also offers IT operations services. Other sources of planned downtime are upgrades or scheduled equipment repairs. Most cloud vendors have some planned downtime, but because their business is based on providing high uptime, scheduled downtimes are kept to a minimum.
Unplanned downtime is where cloud vendors have the most to offer, and also the most to lose. Recent large outages at Amazon and Google have shown that even the largest cloud vendors can still have glitches that take considerable time to repair and give potential cloud customers a scare (perhaps it is because they didn't take some planned downtime??) On the other hand, cloud vendors have the experienced staff and proven processes that should produce overall hardware and network reliability that meets or exceeds that of the average corporate data center, and far exceeds anything you can achieve with colocated or self-managed servers.
However, despite claims of reliablity, few cloud vendors have tight SLAs (service level agreements) that promise controlled downtime or offer rebates for excess downtime. Amazon goes the opposite direction and doesn't offer any uptime guarantees, even cautioning users that their instance (or server) can disappear at any time and that they should plan accordingly. AppLogic-based clouds, provided by companies such as ENKI, are capable of offering better guarantees of uptime because of its inherent self-healing capabilities that can enable 3-4 nines of uptime. (The exact number depends on how the AppLogic system is set up and administered, which affects the time needed for the system to heal itself.) However, any cloud computing system, even even those based on AppLogic or similar technologies, can experience unplanned downtime for a variety of reasons, including the common culprit of human error. While I believe it is possible to produce a cloud computing service that exceeds 4-9s of uptime, the costs would be so high that few would buy it when they compared the price to the average cloud offering.
When you're purchasing cloud computing, it makes sense to look at the SLA of the vendor as well as the reliability of the underlying technology. But if your needs for uptime exceed that which the vendors and their technology can offer, there are time-honored techniques for improving it, most of which involve doubling the amount of computing nodes in your application. There's an old adage that each additional "9" of uptime you get doubles your cost, and that's because you need backup systems that are in place to take over if the primaries fail. This involves creating a system architecture for your application that allows for either active/passive failover (meaning that the backup nodes are running but not doing anything) or active/active failover (meaning that the backup nodes are normally providing application computing capability).
These solutions can be implemented in any cloud technology but they always require extra design and configuration effort for your application, and they should be tested rigorously to make sure they will work when the chips are down. Failover solutions are generally less expensive to implement in the Cloud because of the on-demand or pay-as-you go nature of cloud services, which means that you can easily size the backup server nodes to meet your needs and save on computing resources.
An important component of reliability is a good backup strategy. With cloud computing systems like AppLogic offering highly reliable storage as part of the package, many customers are tempted to skip backup. But data loss and the resulting unplanned downtime can result not just from failures in the cloud platform, but also software bugs, human error, or malfeasance such as hacking. If you don't have a backup, you'll be down a long time - and this applies equally to cloud and non-cloud solutions. The advantages of cloud solutions is that there is usually an inexpensive and large storage facility coupled with the cloud computing offering which gives you a convenient place to store your backups.
For the truly fanatical, backing up your data from one cloud vendor to another provides that extra measure of security. It pays to think through your backup strategy because most of today's backup software packages or remote backup services were designed for physical servers and not virtual environments having many virtual servers such as you might find in the cloud. This can mean very high software costs for doing backup if your backup software charges on a "per server" basis and your application is spread across many instances. If your cloud vendor has a backup offering, usually they have found a way to make backup affordable even if your application consists of many compute instances.
Another aspect of reliability that often escapes cloud computing customers new to the world of computing services is monitoring. It's very hard to react to unplanned downtime if you don't know your system is down. It's also hard to avoid unplanned downtime if you don't know you're about to run out of disk space or memory, or perhaps your application is complaining about data corruption. A remote monitoring service can scan your servers in the cloud on a regular basis for faults, application problems, or even measure the performance of your application (like how long it takes to buy a widget in your web store) and report to you if anything is out of the ordinary. I say "service" because if you were to install your own monitoring server into your cloud and the cloud went down, so would your monitoring! At ENKI, we solve this problem by having our monitoring service hosted in a separate data center and under a different software environment than our primary cloud hosting service.
The last aspect of reliability is security. However, that would require another entire article to cover, since security in the cloud is a complex and relatively new topic.
To sum up, the Cloud offers some enticing advantages with respect to reliability, perhaps the largest of which is that you can give your data center operations responsibility to someone who theoretically can do a much better job at a lower cost than you can. However, to get very good reliability, you must still apply traditional approaches of redundancy and observability that have been used in physical data centers for decades - or, you have to find a cloud computing services provider that can implement them for you.
|faseidl 09/10/08 11:45:52 AM EDT|
Despite what many pundits have to say, reliability issues will not be the downfall of cloud computing. Using cloud computing does not mean neglecting to architect solutions that meet their business requirements, including reliability requirements.
I wrote more about this idea here:
Cloud Computing and Reliability
The world's leading Cloud event, Cloud Expo has launched Microservices Journal on the SYS-CON.com portal, featuring over 19,000 original articles, news stories, features, and blog entries. DevOps Journal is focused on this critical enterprise IT topic in the world of cloud computing. Microservices Journal offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. Follow new article posts on T...
Apr. 26, 2015 11:00 AM EDT Reads: 2,117
SYS-CON Events announced today that Litmus Automation will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Litmus Automation’s vision is to provide a solution for companies that are in a rush to embrace the disruptive Internet of Things technology and leverage it for real business challenges. Litmus Automation simplifies the complexity of connected devices applications with Loop, a secure and scalable clou...
Apr. 26, 2015 11:00 AM EDT Reads: 1,758
In 2015, 4.9 billion connected "things" will be in use. By 2020, Gartner forecasts this amount to be 25 billion, a 410 percent increase in just five years. How will businesses handle this rapid growth of data? Hadoop will continue to improve its technology to meet business demands, by enabling businesses to access/analyze data in real time, when and where they need it. Cloudera's Chief Technologist, Eli Collins, will discuss how Big Data is keeping up with today's data demands and how in t...
Apr. 26, 2015 11:00 AM EDT Reads: 1,478
SYS-CON Media announced today that @ThingsExpo Blog launched with 7,788 original stories. @ThingsExpo Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @ThingsExpo Blog can be bookmarked. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago.
Apr. 26, 2015 11:00 AM EDT Reads: 2,573
As Marc Andreessen says software is eating the world. Everything is rapidly moving toward being software-defined – from our phones and cars through our washing machines to the datacenter. However, there are larger challenges when implementing software defined on a larger scale - when building software defined infrastructure. In his session at 16th Cloud Expo, Boyan Ivanov, CEO of StorPool, will provide some practical insights on what, how and why when implementing "software-defined" in the dat...
Apr. 26, 2015 11:00 AM EDT Reads: 1,633
SYS-CON Events announced today that robomq.io will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. robomq.io is an interoperable and composable platform that connects any device to any application. It helps systems integrators and the solution providers build new and innovative products and service for industries requiring monitoring or intelligence from devices and sensors.
Apr. 26, 2015 11:00 AM EDT Reads: 2,105
SYS-CON Events announced today that Vicom Computer Services, Inc., a provider of technology and service solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. They are located at booth #427. Vicom Computer Services, Inc. is a progressive leader in the technology industry for over 30 years. Headquartered in the NY Metropolitan area. Vicom provides products and services based on today’s requirements...
Apr. 26, 2015 10:30 AM EDT Reads: 1,767
How do you securely enable access to your applications in AWS without exposing any attack surfaces? The answer is usually very complicated because application environments morph over time in response to growing requirements from your employee base, your partners and your customers. In his session at 16th Cloud Expo, Haseeb Budhani, CEO and Co-founder of Soha, will share five common approaches that DevOps teams follow to secure access to applications deployed in AWS, Azure, etc., and the frict...
Apr. 26, 2015 10:30 AM EDT Reads: 1,674
Modern Systems announced completion of a successful project with its new Rapid Program Modernization (eavRPMa"c) software. The eavRPMa"c technology architecturally transforms legacy applications, enabling faster feature development and reducing time-to-market for critical software updates. Working with Modern Systems, the University of California at Santa Barbara (UCSB) leveraged eavRPMa"c to transform its Student Information System from Software AG's Natural syntax to a modern application lev...
Apr. 26, 2015 10:15 AM EDT Reads: 1,785
Buzzword alert: Microservices and IoT at a DevOps conference? What could possibly go wrong? Join this panel of experts as they peel away the buzz and discuss the important architectural principles behind implementing IoT solutions for the enterprise. As remote IoT devices and sensors become increasingly intelligent, they become part of our distributed cloud environment, and we must architect and code accordingly. At the very least, you’ll have no problem filling in your buzzword bingo cards.
Apr. 26, 2015 10:00 AM EDT Reads: 2,221
IoT is still a vague buzzword for many people. In his session at @ThingsExpo, Mike Kavis, Vice President & Principal Cloud Architect at Cloud Technology Partners, discussed the business value of IoT that goes far beyond the general public's perception that IoT is all about wearables and home consumer services. He also discussed how IoT is perceived by investors and how venture capitalist access this space. Other topics discussed were barriers to success, what is new, what is old, and what th...
Apr. 26, 2015 10:00 AM EDT Reads: 6,220
Containers and microservices have become topics of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 16th Cloud Expo at the Javits Center in New York June 9-11 will find fresh new content in a new track called PaaS | Containers & Microservices Containers are not being considered for the first time by the cloud community, but a current era of re-consideration has pushed them to the top of the cloud agenda. With the launch ...
Apr. 26, 2015 10:00 AM EDT Reads: 2,892
SYS-CON Events announced today the IoT Bootcamp – Jumpstart Your IoT Strategy, being held June 9–10, 2015, in conjunction with 16th Cloud Expo and Internet of @ThingsExpo at the Javits Center in New York City. This is your chance to jumpstart your IoT strategy. Combined with real-world scenarios and use cases, the IoT Bootcamp is not just based on presentations but includes hands-on demos and walkthroughs. We will introduce you to a variety of Do-It-Yourself IoT platforms including Arduino, Ras...
Apr. 26, 2015 10:00 AM EDT Reads: 3,039
Internet of Things (IoT) will be a hybrid ecosystem of diverse devices and sensors collaborating with operational and enterprise systems to create the next big application. In their session at @ThingsExpo, Bramh Gupta, founder and CEO of robomq.io, and Fred Yatzeck, principal architect leading product development at robomq.io, will discuss how choosing the right middleware and integration strategy from the get-go will enable IoT solution developers to adapt and grow with the industry, while at...
Apr. 26, 2015 10:00 AM EDT Reads: 1,978
The only place to be June 9-11 is Cloud Expo & @ThingsExpo 2015 East at the Javits Center in New York City. Join us there as delegates from all over the world come to listen to and engage with speakers & sponsors from the leading Cloud Computing, IoT & Big Data companies. Cloud Expo & @ThingsExpo are the leading events covering the booming market of Cloud Computing, IoT & Big Data for the enterprise. Speakers from all over the world will be hand-picked for their ability to explore the economic...
Apr. 26, 2015 10:00 AM EDT Reads: 4,191
SYS-CON Events announced today that AIC, a leading provider of OEM/ODM server and storage solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. AIC is a leading provider of both standard OTS, off-the-shelf, and OEM/ODM server and storage solutions. With expert in-house design capabilities, validation, manufacturing and production, AIC's broad selection of products are highly flexible and are conf...
Apr. 26, 2015 10:00 AM EDT Reads: 5,051
From telemedicine to smart cars, digital homes and industrial monitoring, the explosive growth of IoT has created exciting new business opportunities for real time calls and messaging. In his session at @ThingsExpo, Ivelin Ivanov, CEO and Co-Founder of Telestax, shared some of the new revenue sources that IoT created for Restcomm – the open source telephony platform from Telestax. Ivelin Ivanov is a technology entrepreneur who founded Mobicents, an Open Source VoIP Platform, to help create, de...
Apr. 26, 2015 10:00 AM EDT Reads: 5,181
Deploying and operating cloud technology is difficult and falls outside the core competencies of most organizations. As a result, many on-premises private cloud installations falter, casting doubt on private cloud as a solution in general. Yet private clouds come with a unique set of benefits that continue to drive widespread interest. There must be a better way. This session will give participants a clear roadmap of private cloud implementation challenges and offer actionable ideas for...
Apr. 26, 2015 10:00 AM EDT Reads: 1,229
@ThingsExpo has been named the Top 5 Most Influential Internet of Things Brand by Onalytica in the ‘The Internet of Things Landscape 2015: Top 100 Individuals and Brands.' Onalytica analyzed Twitter conversations around the #IoT debate to uncover the most influential brands and individuals driving the conversation. Onalytica captured data from 56,224 users. The PageRank based methodology they use to extract influencers on a particular topic (tweets mentioning #InternetofThings or #IoT in this ...
Apr. 26, 2015 10:00 AM EDT Reads: 2,020
SYS-CON Events announced today that the DevOps Institute has been named “Association Sponsor” of SYS-CON's DevOps Summit, which will take place on June 9–11, 2015, at the Javits Center in New York City, NY. The DevOps Institute provides enterprise level training and certification. Working with thought leaders from the DevOps community, the IT Service Management field and the IT training market, the DevOps Institute is setting the standard in quality for DevOps education and training.
Apr. 26, 2015 10:00 AM EDT Reads: 2,195