@CloudExpo Authors: Liz McMillan, Yeshim Deniz, Elizabeth White, Zakia Bouachraoui, Pat Romanski

Related Topics: @CloudExpo, Microservices Expo, Cloud Security

@CloudExpo: Article

What You Can Do When Thunder Strikes Your Cloud

Best practices for mitigating cloud application outages

In spite of the hype that a Cloud system or application will never fail, we are still seeing cases of cloud system failures. The recent ones being Lightining strikes in Dublin taking the Amazon, Microsoft Clouds down for a while. While this may cause some Fear, Uncertainty and Doubt on Cloud, the underlying fact remains that transforming an application to Cloud means not just setting a switch for the enterprises, but there needs to be lot more planning and the best practices that are proven in the traditional data center are still valid. The following are some of the best practices in prevent the Cloud outages. These are beyond the basic disaster recovery provisions given by the most cloud providers.

Avoiding a Single Point of Failure Across Tenants: It is a general trend that most of the Cloud applications tend to be multi-tenant in nature. Again as explained in my other articles, multi tenancy within an enterprise means, different geographical regions, business units or other acquired and merged entities. However, the way the load balancing, Web Server Scalability, Application Server routing and database partitioning should be done in such a way, that a single failure of a Cloud component like a database , application server should not make the all the tenants down. The database partitioning strategy plays an important role here.

Suppose in an enterprise ERP application hosted on cloud, and if the ERP application is logically separated by plants or warehouses, then ensure that a failure of a single Virtual Machine or data store does not shut down all the plants, but only specific plants. All the load balancing, routing and data partitioning schemes should adhere to the principle of avoiding total failure if a few virtual machines are down.

Utilizing the Out-of-the-Box Features of the Vendor for Availability: Typically most cloud providers provide you multiple choices to whether the disaster and outage scenarios. It is up the enterprises to evaluate and choose the best ones suited to their needs. Some of the typical options given by various vendors are:

  • Multiple data centers across the zones: Most providers have their location in all continents or in major locations across the world. It is good choice to choose the scalability options across these locations to ensure that failure of a single location does not result in total outage of your application.
  • Availability Zones: Though this a specific Amazon EC2 terminology, this concept is more about making certain servers and networks isolated from the failures of other parts within a particular geographical regions. Careful analysis of this feature and scaling out the application and data across availability zones would be a viable option.

Utilizing the Out-of-the-Box Features of the Backups: Most vendors do provide multiple choices for backing up the data automatically. However, it is up the enterprises to choose them to fit to their needs.

For example, why we use the Windows Azure Storage, All your content stored on Windows Azure is replicated three times. No matter which storage service you use, your data will be replicated on different fault domains thereby making it much more fault tolerant. Similar SQL Azure makes automatic backup of the database.

Similarly the EBS Storage units in Amazon do provide automatic options for replicating the data into the multiple servers within an availability Zone and options like S3 provide backup across availability zones.

Building a Custom Storage Backup Strategy: One of the major reasons for outage of applications is due to the reason that these applications fully reliant on the vendor provided automatic backup options. So if everything else fails, application owners have no options  but to wait for the Vendor to restore their services.

Also vendor (cloud provider) backup options will not protect against application failures like data corruption, accidental or deliberate deletion of data and hence a custom application specific backup strategy is needed.

Most Cloud Services do provide many custom options too, for example if you use Cloud databases like Oracle RDS you have options like recycle bin  and flashback database that can help to restore the database content to a specific point of time.

Another simple option which always worked effectively is to use the features like TRIGGER or Message Queues to replicate the transactions to a different server or regions. This will ensure that the all the important transactions have been backed up and making the restore option also easier.

Creating Copy Back In To the Data Center: No current enterprise is going to fully relinquish the data centers and do the business on Cloud, rather there will be a HYBRID delivery of a combination of  data center, private and public clouds. In that scenario keeping a local copy of the most critical data is always a better option. Most Cloud providers do support such a scenario too.

For example with support from SQL Azure Data sync, we can replicate the data from Cloud back to the data centers.

SQL Azure Data Sync Scenarios:

  • Cloud to cloud synchronization
  • Enterprise (on-premise) to cloud
  • Cloud to on-premise
  • Bi-directional or sync-to-hub or sync-from-hub synchronization

Summary: Cloud has far reaching potential to enable the enterprises to concentrate on business capability needs versus operational and maintenance needs. Cloud also opens up new areas like High Performance Computing, Platform and Solutions as Service. Few of the initial outages should not create a fear, uncertainty and doubt in the minds of the enterprises.

It is all about the SLA needs of the individual applications and how we plan the cloud deployment. For example it's almost impossible for today's enterprises to suddenly provision a data center in a different continent and utilize for its disaster recovery needs. However most cloud providers allow for such a scenario as a simple self-service based provisioning.

It is up to the enterprises to evaluate the out-of-the-box as well as custom features against the SLA needs and come up with an appropriate strategy. This will make the Cloud Journey of the enterprises more fruitful.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

CloudEXPO Stories
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buyers learn their thoughts on their experience.
In this presentation, you will learn first hand what works and what doesn't while architecting and deploying OpenStack. Some of the topics will include:- best practices for creating repeatable deployments of OpenStack- multi-site considerations- how to customize OpenStack to integrate with your existing systems and security best practices.
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at Dice, he takes a metrics-driven approach to management. His experience in building and managing high performance teams was built throughout his experience at Oracle, Sun Microsystems and SocialEkwity.
Transformation Abstract Encryption and privacy in the cloud is a daunting yet essential task for both security practitioners and application developers, especially as applications continue moving to the cloud at an exponential rate. What are some best practices and processes for enterprises to follow that balance both security and ease of use requirements? What technologies are available to empower enterprises with code, data and key protection from cloud providers, system administrators, insiders, government compulsion, and network hackers? Join Ambuj Kumar (CEO, Fortanix) to discuss best practices and technologies for enterprises to securely transition to a multi-cloud hybrid world.
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the benefits of the cloud without losing performance as containers become the new paradigm.