|By Eric Novikoff||
|August 27, 2008 02:40 AM EDT||
Eric Novikoff's Blog
IT managers and pundits speak of the reliability of a system in "nines." Two nines is the same as 99%, which comes to (100%-99%)*365 or 3.65 days of downtime per year, which is typical for non-redundant hardware if you include the time to reload the operating system and restore backups (if you have them) after a failure. Three nines is about 8 hours of downtime, four nines is about 52 minutes and the holy grail of 5 nines is 7 minutes.
From a users' point of view, downtime is downtime, but for a provider/vendor/web site manager, downtime is divided into planned and unplanned. Cloud computing can offer some benefits for planned downtime, but the place that it can have the largest effect on a business is in reducing unplanned downtime.
Planned downtime is usually the result of having to do some sort of software maintenance or release process, which is usually outside the domain of the cloud vendor, unless that vendor also offers IT operations services. Other sources of planned downtime are upgrades or scheduled equipment repairs. Most cloud vendors have some planned downtime, but because their business is based on providing high uptime, scheduled downtimes are kept to a minimum.
Unplanned downtime is where cloud vendors have the most to offer, and also the most to lose. Recent large outages at Amazon and Google have shown that even the largest cloud vendors can still have glitches that take considerable time to repair and give potential cloud customers a scare (perhaps it is because they didn't take some planned downtime??) On the other hand, cloud vendors have the experienced staff and proven processes that should produce overall hardware and network reliability that meets or exceeds that of the average corporate data center, and far exceeds anything you can achieve with colocated or self-managed servers.
However, despite claims of reliablity, few cloud vendors have tight SLAs (service level agreements) that promise controlled downtime or offer rebates for excess downtime. Amazon goes the opposite direction and doesn't offer any uptime guarantees, even cautioning users that their instance (or server) can disappear at any time and that they should plan accordingly. AppLogic-based clouds, provided by companies such as ENKI, are capable of offering better guarantees of uptime because of its inherent self-healing capabilities that can enable 3-4 nines of uptime. (The exact number depends on how the AppLogic system is set up and administered, which affects the time needed for the system to heal itself.) However, any cloud computing system, even even those based on AppLogic or similar technologies, can experience unplanned downtime for a variety of reasons, including the common culprit of human error. While I believe it is possible to produce a cloud computing service that exceeds 4-9s of uptime, the costs would be so high that few would buy it when they compared the price to the average cloud offering.
When you're purchasing cloud computing, it makes sense to look at the SLA of the vendor as well as the reliability of the underlying technology. But if your needs for uptime exceed that which the vendors and their technology can offer, there are time-honored techniques for improving it, most of which involve doubling the amount of computing nodes in your application. There's an old adage that each additional "9" of uptime you get doubles your cost, and that's because you need backup systems that are in place to take over if the primaries fail. This involves creating a system architecture for your application that allows for either active/passive failover (meaning that the backup nodes are running but not doing anything) or active/active failover (meaning that the backup nodes are normally providing application computing capability).
These solutions can be implemented in any cloud technology but they always require extra design and configuration effort for your application, and they should be tested rigorously to make sure they will work when the chips are down. Failover solutions are generally less expensive to implement in the Cloud because of the on-demand or pay-as-you go nature of cloud services, which means that you can easily size the backup server nodes to meet your needs and save on computing resources.
An important component of reliability is a good backup strategy. With cloud computing systems like AppLogic offering highly reliable storage as part of the package, many customers are tempted to skip backup. But data loss and the resulting unplanned downtime can result not just from failures in the cloud platform, but also software bugs, human error, or malfeasance such as hacking. If you don't have a backup, you'll be down a long time - and this applies equally to cloud and non-cloud solutions. The advantages of cloud solutions is that there is usually an inexpensive and large storage facility coupled with the cloud computing offering which gives you a convenient place to store your backups.
For the truly fanatical, backing up your data from one cloud vendor to another provides that extra measure of security. It pays to think through your backup strategy because most of today's backup software packages or remote backup services were designed for physical servers and not virtual environments having many virtual servers such as you might find in the cloud. This can mean very high software costs for doing backup if your backup software charges on a "per server" basis and your application is spread across many instances. If your cloud vendor has a backup offering, usually they have found a way to make backup affordable even if your application consists of many compute instances.
Another aspect of reliability that often escapes cloud computing customers new to the world of computing services is monitoring. It's very hard to react to unplanned downtime if you don't know your system is down. It's also hard to avoid unplanned downtime if you don't know you're about to run out of disk space or memory, or perhaps your application is complaining about data corruption. A remote monitoring service can scan your servers in the cloud on a regular basis for faults, application problems, or even measure the performance of your application (like how long it takes to buy a widget in your web store) and report to you if anything is out of the ordinary. I say "service" because if you were to install your own monitoring server into your cloud and the cloud went down, so would your monitoring! At ENKI, we solve this problem by having our monitoring service hosted in a separate data center and under a different software environment than our primary cloud hosting service.
The last aspect of reliability is security. However, that would require another entire article to cover, since security in the cloud is a complex and relatively new topic.
To sum up, the Cloud offers some enticing advantages with respect to reliability, perhaps the largest of which is that you can give your data center operations responsibility to someone who theoretically can do a much better job at a lower cost than you can. However, to get very good reliability, you must still apply traditional approaches of redundancy and observability that have been used in physical data centers for decades - or, you have to find a cloud computing services provider that can implement them for you.
|faseidl 09/10/08 11:45:52 AM EDT|
Despite what many pundits have to say, reliability issues will not be the downfall of cloud computing. Using cloud computing does not mean neglecting to architect solutions that meet their business requirements, including reliability requirements.
I wrote more about this idea here:
Cloud Computing and Reliability
Achim Weiss is Chief Executive Officer and co-founder of ProfitBricks. In 1995, he broke off his studies to co-found the web hosting company "Schlund+Partner." The company "Schlund+Partner" later became the 1&1 web hosting product line. From 1995 to 2008, he was the technical director for several important projects: the largest web hosting platform in the world, the second largest DSL platform, a video on-demand delivery network, the largest eMail backend in Europe, and a universal billing syste...
Oct. 10, 2015 01:00 AM EDT Reads: 232
Containers are revolutionizing the way we deploy and maintain our infrastructures, but monitoring and troubleshooting in a containerized environment can still be painful and impractical. Understanding even basic resource usage is difficult - let alone tracking network connections or malicious activity. In his session at DevOps Summit, Gianluca Borello, Sr. Software Engineer at Sysdig, will cover the current state of the art for container monitoring and visibility, including pros / cons and li...
Oct. 10, 2015 12:00 AM EDT Reads: 263
Containers have changed the mind of IT in DevOps. They enable developers to work with dev, test, stage and production environments identically. Containers provide the right abstraction for microservices and many cloud platforms have integrated them into deployment pipelines. DevOps and Containers together help companies to achieve their business goals faster and more effectively.
Oct. 10, 2015 12:00 AM EDT Reads: 216
As a CIO, are your direct reports IT managers or are they IT leaders? The hard truth is that many IT managers have risen through the ranks based on their technical skills, not their leadership ability. Many are unable to effectively engage and inspire, creating forward momentum in the direction of desired change. Renowned for its approach to leadership and emphasis on their people, organizations increasingly look to our military for insight into these challenges.
Oct. 9, 2015 11:15 PM EDT Reads: 205
SYS-CON Events announced today that Dyn, the worldwide leader in Internet Performance, will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Dyn is a cloud-based Internet Performance company. Dyn helps companies monitor, control, and optimize online infrastructure for an exceptional end-user experience. Through a world-class network and unrivaled, objective intelligence into Internet condit...
Oct. 9, 2015 11:00 PM EDT Reads: 650
Today air travel is a minefield of delays, hassles and customer disappointment. Airlines struggle to revitalize the experience. GE and M2Mi will demonstrate practical examples of how IoT solutions are helping airlines bring back personalization, reduce trip time and improve reliability. In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect with GE, and Dr. Sarah Cooper, M2Mi's VP Business Development and Engineering, will explore the IoT cloud-based platform technologies driv...
Oct. 9, 2015 10:15 PM EDT Reads: 142
There are many considerations when moving applications from on-premise to cloud. It is critical to understand the benefits and also challenges of this migration. A successful migration will result in lower Total Cost of Ownership, yet offer the same or higher level of robustness. Migration to cloud shifts computing resources from your data center, which can yield significant advantages provided that the cloud vendor an offer enterprise-grade quality for your application.
Oct. 9, 2015 09:30 PM EDT Reads: 317
The web app is agile. The REST API is agile. The testing and planning are agile. But alas, data infrastructures certainly are not. Once an application matures, changing the shape or indexing scheme of data often forces at best a top down planning exercise and at worst includes schema changes that force downtime. The time has come for a new approach that fundamentally advances the agility of distributed data infrastructures. Come learn about a new solution to the problems faced by software organ...
Oct. 9, 2015 08:00 PM EDT Reads: 935
The buzz continues for cloud, data analytics and the Internet of Things (IoT) and their collective impact across all industries. But a new conversation is emerging - how do companies use industry disruption and technology enablers to lead in markets undergoing change, uncertainty and ambiguity? Organizations of all sizes need to evolve and transform, often under massive pressure, as industry lines blur and merge and traditional business models are assaulted and turned upside down. In this new da...
Oct. 9, 2015 08:00 PM EDT Reads: 319
Secure Cloud through Automated Compliance | @CloudExpo @CloudRaxak #Cloud #BigData #DevOps #Microservices
Cloud computing delivers on-demand resources that provide businesses with flexibility and cost-savings. The challenge in moving workloads to the cloud has been the cost and complexity of ensuring the initial and ongoing security and regulatory (PCI, HIPAA, FFIEC) compliance across private and public clouds. Manual security compliance is slow, prone to human error, and represents over 50% of the cost of managing cloud applications. Determining how to automate cloud security compliance is critical...
Oct. 9, 2015 06:00 PM EDT Reads: 326
The Internet of Everything is re-shaping technology trends–moving away from “request/response” architecture to an “always-on” Streaming Web where data is in constant motion and secure, reliable communication is an absolute necessity. As more and more THINGS go online, the challenges that developers will need to address will only increase exponentially. In his session at @ThingsExpo, Todd Greene, Founder & CEO of PubNub, will explore the current state of IoT connectivity and review key trends an...
Oct. 9, 2015 05:30 PM EDT Reads: 120
The Internet of Things (IoT) is growing rapidly by extending current technologies, products and networks. By 2020, Cisco estimates there will be 50 billion connected devices. Gartner has forecast revenues of over $300 billion, just to IoT suppliers. Now is the time to figure out how you’ll make money – not just create innovative products. With hundreds of new products and companies jumping into the IoT fray every month, there’s no shortage of innovation. Despite this, McKinsey/VisionMobile data...
Oct. 9, 2015 04:00 PM EDT Reads: 250
You have your devices and your data, but what about the rest of your Internet of Things story? Two popular classes of technologies that nicely handle the Big Data analytics for Internet of Things are Apache Hadoop and NoSQL. Hadoop is designed for parallelizing analytical work across many servers and is ideal for the massive data volumes you create with IoT devices. NoSQL databases such as Apache HBase are ideal for storing and retrieving IoT data as “time series data.”
Oct. 9, 2015 03:45 PM EDT Reads: 510
In recent years, at least 40% of companies using cloud applications have experienced data loss. One of the best prevention against cloud data loss is backing up your cloud data. In his General Session at 17th Cloud Expo, Bryan Forrester, Senior Vice President of Sales at eFolder, will present how organizations can use eFolder Cloudfinder to automate backups of cloud application data. He will also demonstrate how easy it is to search and restore cloud application data using Cloudfinder.
Oct. 9, 2015 03:00 PM EDT Reads: 503
Saviynt Inc. has announced the availability of the next release of Saviynt for AWS. The comprehensive security and compliance solution provides a Command-and-Control center to gain visibility into risks in AWS, enforce real-time protection of critical workloads as well as data and automate access life-cycle governance. The solution enables AWS customers to meet their compliance mandates such as ITAR, SOX, PCI, etc. by including an extensive risk and controls library to detect known threats and b...
Oct. 9, 2015 03:00 PM EDT Reads: 241
SYS-CON Events announced today that Key Information Systems, Inc. (KeyInfo), a leading cloud and infrastructure provider offering integrated solutions to enterprises, will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Key Information Systems is a leading regional systems integrator with world-class compute, storage and networking solutions and professional services for the most advanced softwa...
Oct. 9, 2015 03:00 PM EDT Reads: 415
The enterprise is being consumerized, and the consumer is being enterprised. Moore's Law does not matter anymore, the future belongs to business virtualization powered by invisible service architecture, powered by hyperscale and hyperconvergence, and facilitated by vertical streaming and horizontal scaling and consolidation. Both buyers and sellers want instant results, and from paperwork to paperless to mindless is the ultimate goal for any seamless transaction. The sweetest sweet spot in innov...
Oct. 9, 2015 02:15 PM EDT Reads: 219
The IoT is upon us, but today’s databases, built on 30-year-old math, require multiple platforms to create a single solution. Data demands of the IoT require Big Data systems that can handle ingest, transactions and analytics concurrently adapting to varied situations as they occur, with speed at scale. In his session at @ThingsExpo, Chad Jones, chief strategy officer at Deep Information Sciences, will look differently at IoT data so enterprises can fully leverage their IoT potential. He’ll sha...
Oct. 9, 2015 01:45 PM EDT Reads: 565
Overgrown applications have given way to modular applications, driven by the need to break larger problems into smaller problems. Similarly large monolithic development processes have been forced to be broken into smaller agile development cycles. Looking at trends in software development, microservices architectures meet the same demands. Additional benefits of microservices architectures are compartmentalization and a limited impact of service failure versus a complete software malfunction....
Oct. 9, 2015 01:15 PM EDT Reads: 261
SYS-CON Events announced today that ProfitBricks, the provider of painless cloud infrastructure, will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. ProfitBricks is the IaaS provider that offers a painless cloud experience for all IT users, with no learning curve. ProfitBricks boasts flexible cloud servers and networking, an integrated Data Center Designer tool for visual control over the...
Oct. 9, 2015 01:00 PM EDT Reads: 805