Welcome!

Cloud Expo Authors: Deborah Strickland, JP Morgenthal, Maureen O'Gara, John Cowan, Kevin Benedict

Related Topics: Cloud Expo, Web 2.0

Cloud Expo: Blog Feed Post

How to Gracefully Degrade Web 2.0 Apps

You might have heard that Twitter was down (again) last week

Twitter on Ulitzer

I haven’t heard the term “graceful degradation” in a long time, but as we continue to push the limits of data centers and our budgets to provide capacity it’s a concept we need to revisit.

You might have heard that Twitter was down (again) last week. What you might not have heard (or read) is some interesting crunchy bits about how Twitter attempts to maintain availability by degrading capabilities gracefully when services are over capacity.

Twitter Down, Overwhelmed by Whales” from Data Center Knowledge offered up the juicy details:

blockquote The “whales” comment refers to the “Fail Whale” – the downtime mascot that appears whenever Twitter is unavailable. The appearance of the Fail Whale indicates a server error known as a 503, which then triggers a “Whale Watcher” script that prompts a review of the last 100,000 lines of server logs to sort out what has happened.

When at all possible, Twitter tries to adapt by slowing the site performance as an alternative to a 503. In some cases, this means disabling features like custom searches. In recent weeks Twitter.com users have periodically encountered messages that the service was over capacity, but the condition was usually temporary. At times of heavy load for more on how Twitter manages its capacity challenges, see Using Metrics to Vanquish the Fail Whale.

I found this interesting and refreshing at a time when the answer to capacity problems is to just “go cloud”, primarily because even if (and that’s a big if) “the cloud” was truly capable of “infinite scale” (it is not) it is almost certainly a fact that most organization’s budgets are not capable of “infinite payments” and cloud computing isn’t free.

It’s been many years, in fact, since the phrase “graceful degradation” has been uttered within my hearing, but that’s really what the article is describing and it’s something we don’t talk enough about. Perhaps that’s because it’s difficult to admit that there are limitations – whether technical or financial – on the ability to scale and meet demand. But there are, and if organizations are wise they’ll include in their application delivery strategy the means by which applications and services can “degrade gracefully.”

Twitter’s solution, the disabling of specific features, is a particularly easy way to implement such a strategy for Web 2.0 applications; at least it’s particularly easy if you have a network-side scripting capable solution mediating for the applications.

G


RACEFUL DEGRADATION


The reason it’s particularly easy to gracefully degrade Web 2.0 applications is that there is generally a 1:1 mapping between “functions” and “URIs.” This is often true for the web-facing interface, almost always true for RESTful APIs, and always true for SOAPy endpoints.

What you need to do is identify those “premium” URIs, i.e. those that can be disabled without negatively impacting core services, so that they can be “degraded” in the face of an overwhelming volume of requests.

You also need an intermediary. This can be a Load balancer, assuming it’s capable of providing the flexibility in configuration necessary to enable and disable service to specific URIs, i.e. it must be layer 7 aware. It has to be an intermediary through which all requests are routed because individual servers do not have the visibility required to be able to “see” the total requests and all responses. The fact that a server is throwing back 503 (Internal Error) errors indicates it doesn’t have the resources available to respond to a request, which means it won’t be able to respond to any requests, including those to disable services. Only an architecture that includes an intermediary of some kind (a reverse proxy) can achieve this solution.

The network-side script, which is deployed on the application delivery platform (load balancer), should implement logic that triggers degradation based on receiving 503 errors. It should probably not trigger on a single 503 or multiple 503s from the same application instance as such behavior could be indicative of a problem with that one instance as opposed to being produced due to a lack of capacity. That means the scripting solution needs to be able to take action based on a pattern of behavior coming from all application instances in conjunction with the total number of requests being received from users.

Yes, it has to be context-aware.

Once it’s determined that the errors are being generated due to a lack of capacity, the scripting solution needs to disable one or more of the specific URIs determined to be “premium” or ancillary. The intermediary can then respond to subsequent requests for the disabled URIs with custom content based on the expected response type. For example, if it’s an API call it might be appropriate to return a pre-formatted response in the appropriate data format indicating service is currently unavailable. Many network-side scripting solutions are capable of returning pre-formatted responses or they can be customized to provide more detail – it’s really up to the implementer to decide what information is included and how.

The premise is that as premium or ancillary services are degraded (disabled) that application instances will be able to focus on servicing core requests and return service to normal for those pieces of the application. When the volume of requests returns to within normal operating parameters for the capacity available, the intermediary can restore service to the previously degraded services.

S


CALABILITY is NEVER REALLY INFINITE


From a technological point of view “infinite scale” is not possible. At some point the volume of requests will reach boundaries that simply cannot be overcome, be they limitations on the load balancer (there is a limit to how many servers can ultimately be load balanced, and bandwidth is not unlimited) or on the application infrastructure itself. After all, you can’t launch a new instance of an application if there are no physical resources left on which to launch it.

It is almost certainly the case, however, that before reaching the technical limits of an “infinitely scalable” environment that you will hit a financial limitation. Or it may be the case that you haven’t jumped on the “cloud” bandwagon and what you see is what you get: a limited number of physical resources running a finite number of application instances, and that’s it. In either case, there are limitations on capacity and at some point you may reach them. How you respond to those limitations is an organizational decision, but graceful degradation in a controlled manner is probably more desirable than random, uncontrolled service outages.

Graceful degradation is an acceptable strategy for responding to availability issues and is especially easy to implement for a Web 2.0 application or API. It’s certainly more appealing than the alternative, which leaves every user essentially playing a game of Russian Roulette with availability of your web application.

Read the original blog entry...

More Stories By Lori MacVittie

Lori MacVittie is responsible for education and evangelism of application services available across F5’s entire product suite. Her role includes authorship of technical materials and participation in a number of community-based forums and industry standards organizations, among other efforts. MacVittie has extensive programming experience as an application architect, as well as network and systems development and administration expertise. Prior to joining F5, MacVittie was an award-winning Senior Technology Editor at Network Computing Magazine, where she conducted product research and evaluation focused on integration with application and network architectures, and authored articles on a variety of topics aimed at IT professionals. Her most recent area of focus included SOA-related products and architectures. She holds a B.S. in Information and Computing Science from the University of Wisconsin at Green Bay, and an M.S. in Computer Science from Nova Southeastern University.

Cloud Expo Breaking News
Why are APIs so important in clouds? Do APIs have to be open? How fast or slow will standardization in the cloud be? Why is ensuring high availability for the cloud service critical? In his session at the 10th International Cloud Expo, Mårten Mickos, CEO of Eucalyptus Systems, will answer these questions and address cloud standards, APIs and the critical question: Will we end up with one, two or more competing cloud standards? And, how will this affect the evolution and adoption of cloud comput...
Very few trends in IT have generated as much buzz as cloud computing. In his session at the 10th International Cloud Expo, Mark Hinkle, Director, Cloud Computing Community at Citrix, will cut through the hype and quickly clarify the ontology for cloud computing. The bulk of the conversation will focus on the open source software that can be used to build compute clouds (infrastructure-as-a-service) and the complementary open source management tools that can be combined to automate the management...
The proliferation of device connectivity is redefining the functionality requirements and capabilities of many embedded systems as more and more of these devices look to leverage the “Cloud.” While many commercial software and hardware component vendors have begun to realign their value propositions to satisfy growing demand, commercial-off-the-shelf products (COTS) alone cannot meet every OEM’s needs. As a result, the Embedded Cloud has injected a new level of uncertainty and a new competitive ...
Hardware and chemistry improvements will make the $1,000 human genome a reality soon. While the massive amount of genomics data that will be generated represents a huge opportunity to advance personal medicine, it also presents an enormous big data challenge. In his session at the 10th International Cloud Expo, Dr Andreas Sundquist, CEO of DNAnexus, will discuss how the cloud will address these issues by enabling the management, storage, sharing and analysis of the world’s DNA data and how it ...
With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what else h...
With Big Data Expo 2012 New York (co-located with 10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference...
In 2011, Apache Hadoop received tremendous attention for helping organizations cost-effectively capitalize on their big data. Hadoop is now disrupting the business of analyzing data. In his session at the 10th International Cloud Expo, Eric Baldeschwieler, Co-Founder & CEO of Hortonworks, will look at the current state of the Hadoop project, lessons learned by deploying it at scale, and the roadmap for its future. Big Data Track attendees will learn about the exciting developments that have ...
The focus of Java EE 7 is on the cloud, and specifically it aims to bring Platform-as-a-Service providers and application developers together so that portable applications can be deployed on any cloud infrastructure and reap all its benefits in terms of scalability, elasticity, multitenancy, etc. The existing specifications in the platform such as JPA, Servlets, EJB, and others will be updated to meet these requirements. Java EE 7 continues the ease of development push that characterized prior ...
With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what else h...
With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what else h...