Welcome!

Cloud Expo Authors: Jeremy Geelan, Elizabeth White, Sue Poremba, Pat Romanski, Patrick Burke

Related Topics: SOA & WOA, Virtualization, Web 2.0, Cloud Expo

SOA & WOA: Blog Feed Post

The Real News Is Not that Facebook Serves Up 1 Trillion Pages a Month...

It’s how much load that really generates and how it scales to meet the challenge

It’s how much load that really generates and how it scales to meet the challenge.

There’s some amount of debate whether Facebook really crossed over the one trillion page view per month threshold. While one report says it did, another respected firm says it did not; that its monthly page views are a mere 467 billion per month.

In the big scheme of things, the discrepancy is somewhat irrelevant, as neither show the true load on Facebook’s infrastructure – which is far more impressive a set of numbers than its externally measured “page view” metric.  Mashable reported in “Facebook Surpasses 1 Trillion Pageviews per Month” that the social networking giant saw “approximately 870 million unique visitors in June and 860 million in July” and followed up with some per visitor statistics, indicating “each visitor averaged approximately 1,160 page views in July and 40 per visit — enormous by any standard. Time spent on the site was around 25 minutes per user.”

image

From an architectural standpoint it’s not just about the page views. It’s about requests and responses, many of which occur under the radar from metrics and measurements typically gathered by external services like Google. Much of Facebook’s interactive features are powered by AJAX, which is hidden “in” the page and thus obscured from external view and a “page view” doesn’t necessarily include a count of all the external objects (scripts, images, etc…) that comprises a “page”. So while 1 trillion (or 467 billion, whichever you prefer) is impressive, consider that this is likely only a  fraction of the actual requests and responses handled by Facebook’s massive infrastructure on any given day.

Let’s examine what the actual requests and responses might mean in terms of load on Facebook’s infrastructure, shall we?

SOME QUICK MATH

Loading up Facebook yields 125 requests to load various scripts, images, and content. That’s a “page view”. Sitting on the page for a few minutes and watching Firebug’s console, you’ll note a request to update content occurs approximately every minute you are on a page. If we do the math – based on approximate page views per visitor, each of which incurs 125 GET requests – we can math that up to an approximation of 19,468 RPS (Requests per Second).

That’s only an approximation, mind you, and doesn’t take into consideration the time factor, which also incurs AJAX-based requests to update content occurring on a fairly regular basis. These also add to the overall load on Facebook’s massive infrastructure. And that’s before we start considering the impact from “unseen” integrated traffic via Facebook’s API which, according to the most recently available data (2009) was adding 5 billion requests a day to that load. If you’re wondering, that’s an additional 57,870 requests per second, which gives us a more complete number of 77,338 requests per second.

SOURCE: webcast2009 Interop F5 Keynote

image

Let’s take a moment to digest that, because that’s a lot of load on a site – and I’m sure it still isn’t taking into consideration everything. We also have to remember that the load at any given time could be higher – or lower – based on usage patterns. Averaging totals over a month and distilling down to a per second average is just that – a mathematical average. It doesn’t take into consideration that peaks and valleys occur in usage throughout the day and that Facebook may be averaging only a fraction of that load with spikes two and three times as high throughout the day.

That realization should be a bit sobering, as we’ve seen recent DDoS attacks that have crippled and even toppled sites with less traffic than Facebook handles in any given minute of the day.

The question is, how do they do it? How do they manage to keep the service up and available despite the overwhelming load and certainty of traffic spikes?

IT’S the ARCHITECTURE

Facebook itself does a great job of discussing exactly how it manages to sustain such load over time while simultaneously managing growth, and its secret generally revolves around architectural choices. Not just the “Facebook” application architecture, but its use of infrastructure architecture as well. That may not always be apparent from Facebook’s engineering blog, which generally focuses on application and software architecture topics, but it is inherent in those architectural decisions.

Take, for example, an engineer’s discussion on Facebook’s secrets to scaling to over 500 million users and beyond. The very first point made is to “scale horizontally”.

quote-badge

This isn't at all novel but it's really important. If something is increasing exponentially, the only sensible way to deal with it is to get it spread across arbitrarily many machines. Remember, there are only three numbers in computer science: 0, 1, and n. (Scaling Facebook to 500 Million Users and Beyond (Facebook Engineering Blog))

Horizontal scalability is, of course, enabled via load balancing which generally (but not always) implies infrastructure components that are critical to an overall growth and scalability strategy. The abstraction afforded by the use of load balancing services also has the added benefit of enabling agile operations as it becomes cost and time effective to add and remove (provision and decommission) compute resources as a means to meet scaling challenges on-demand, which is a key component of cloud computing models.

In other words, in addition to Facebook’s attention to application architecture as a means to enable scalability, it also takes advantage of infrastructure components providing load balancing services to ensure that its massive load is distributed not just geographically but efficiently across its various clusters of application functionality. It’s a collaborative architecture that spans infrastructure and application tiers, taking advantage of the speed and scalability benefits afforded by both approaches simultaneously.

Yet Facebook is not shy about revealing its use of infrastructure as a means to scale and implement its architecture; you just have to dig around to find it. Consider as an example of a collaborative architecture the solution to some of the challenges Facebook has faced trying to scale out its database, particularly in the area of synchronization across data centers. This is a typical enterprise challenge made even more difficult by Facebook’s decision to separate “write” databases from “read” to enhance the scalability of its application architecture. The solution is found in something Facebook engineers call “Page Routing” but most of us in the industry call “Layer 7 Switching” or “Application Switching”:

quote-badge

The problem thus boiled down to, when a user makes a request for a page, how do we decide if it is "safe" to send to Virginia or if it must be routed to California?

This question turned out to have a relatively straightforward answer. One of the first servers a user request to Facebook hits is called a Load balancer; this machine's primary responsibility is picking a web server to handle the request but it also serves a number of other purposes: protecting against denial of service attacks and multiplexing user connections to name a few. This load balancer has the capability to run in Layer 7 mode where it can examine the URI a user is requesting and make routing decisions based on that information. This feature meant it was easy to tell the load balancer about our "safe" pages and it could decide whether to send the request to Virginia or California based on the page name and the user's location. (Scaling Out (Facebook Engineering Blog))

That’s the hallmark of the modern, agile data center and the core of cloud computing models: collaborative, dynamic infrastructure and applications leveraging technology to enable a cost-efficient, scalable architectures able to maintain growth along with the business.

SCALABILITY TODAY REQUIRES a COMPREHENSIVE ARCHITECTURAL STRATEGY

Today’s architectures – both application and infrastructure – are growing necessarily complex to meet the explosive growth of a variety of media and consumers. Applications alone cannot scale themselves out – there simply aren’t physical machines large enough to support the massive number of users and load on applications created by the nearly insatiable demand consumers have for online games, shopping, interaction, and news. Modern applications must be deployed and delivered collaboratively with infrastructure if they are to scale and support growth in an operationally and financially efficient manner.

Facebook’s ability to grow and scale along with demand is enabled by its holistic, architectural approach that leverages both modern application scalability patterns as well as infrastructure scalability patterns. Together, infrastructure and applications are enabling the social networking giant to continue to grow steadily with very few hiccups along the way. Its approach is one that is well-suited for any organization wishing to scale efficiently over time with the least amount of disruption and with the speed of deployment required of today’s demanding business environments.

Read the original blog entry...

More Stories By Lori MacVittie

Lori MacVittie is responsible for education and evangelism of application services available across F5’s entire product suite. Her role includes authorship of technical materials and participation in a number of community-based forums and industry standards organizations, among other efforts. MacVittie has extensive programming experience as an application architect, as well as network and systems development and administration expertise. Prior to joining F5, MacVittie was an award-winning Senior Technology Editor at Network Computing Magazine, where she conducted product research and evaluation focused on integration with application and network architectures, and authored articles on a variety of topics aimed at IT professionals. Her most recent area of focus included SOA-related products and architectures. She holds a B.S. in Information and Computing Science from the University of Wisconsin at Green Bay, and an M.S. in Computer Science from Nova Southeastern University.

Cloud Expo Breaking News
What do the CTO of the U.S. Dept. of Justice and the CIO of the National Reconnaissance Office have in common with the CEOs of Eucalyptus, GoGrid, ActiveState, Appcara, OpSource and Nortonworks, the CTOs of Rackspace, SoftLayer and AppZero, the Founder & General Manager of Dell Boomi, the VP of Big Data & Streams at IBM and the Chief Strategy Officer at Pacific Controls? Answer: all are shortly to present breakout sessions as members of the distinguished Speaker Faculty of Cloud Expo New York, ...
The cloud has many benefits, but when it comes to application development, how does the cloud help enterprises and development teams create custom software and applications that end users actually care about? Using real world examples from Adobe, Herff Jones and Navy Federal Credit Union, this session will highlight the advantages cloud computing provides for quickly developing custom software and applications with compelling user experiences. In their general session at the 10th International ...
Nearly every enterprise is evaluating cloud computing solutions either today or in the near term. Many have already made the leap, and many more are getting close to putting that first toe in the water. But there are key considerations that should be made, questions to be asked, and designs to consider before you can feel secure with your provider. In his session at the 10th International Cloud Expo, David Gulick, Product Manager, Hosting Product Management at Savvis, will help give you food f...
With Cloud Expo 2012 New York (10th Cloud Expo) now under four weeks away, what better time to introduce you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and strategy sessions for you dealing with every nook and cranny of Cloud Computing, but what of those who are presenting? Who are they, where do they work, what else have they written and/or said about the Cloud that is t...
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in high-performance, high-efficiency server technology and green computing, will exhibit at SYS-CON's 10th International Cloud Expo, which will take place on June 11–14, 2012, at the Javits Center in New York City, New York. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions for Embedded Systems, E...
SYS-CON Events announced today that ScaleMP, a leading provider of virtualization solutions for high-end computing, will exhibit at SYS-CON's 10th International Cloud Expo, which will take place on June 11–14, 2012, at the Javits Center in New York City, New York. ScaleMP is the leader in virtualization for high-end computing, providing maximum performance and lower total cost of ownership (TCO). The innovative Versatile SMP (vSMP) architecture aggregates multiple independent systems into a sin...
Come learn real-world examples where cloud and mobile are changing the way business works and the impact they're having on efficiency and productivity. In his session at the 10th International Cloud Expo, Rodrigo Coutinho Senior Product Marketing Manager at OutSystems, will look at how mobile and the cloud are interwoven and the wave of change these two 2012 megatrends will bring to your organization. He will also provide a roadmap to assure you can navigate this sea change for business succes...
Enterprise IT organizations want to deploy a virtualized data center fabric that will provide the foundation for agile private cloud computing. Getting there does not have to be difficult, but it does require a new approach to data center infrastructure design – an approach that is non-disruptive, vendor-agnostic, and very adaptable to changing business requirements. In his session at the 10th International Cloud Expo, Bruce Fingles, Chief Information Officer and VP of Product Quality at Xsigo...
With Cloud Expo 2012 New York (10th Cloud Expo) now under four weeks away, what better time to introduce you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference...
How can businesses harness the power of APIs to reach new customers and markets? In his session at the 10th International Cloud Expo, Alistair Farquharson, CTO at SOA Software, will walk the audience through the growth and evolution of the API, why effective API management is important, and how the game changes when companies expose business applications to the outside world. He will also discuss: A brief history of the API How to use APIs to make money, save money, build brand "Appificatio...