Welcome!

@CloudExpo Authors: Darren Anstee, Elizabeth White, William Schmarzo, Olivier Huynh Van, Yeshim Deniz

Related Topics: @CloudExpo

@CloudExpo: Blog Post

The Economics of Big Data: Why Faster Software is Cheaper

Faster means better and cheaper - lower latency and lower cost!

In big data computing, and more generally in all commercial highly parallel software systems, speed matters more than just about anything else. The reason is straightforward, and has been known for decades.

Put very simply, when it comes to massively parallel software of the kind need to handle big data, fast is both better AND cheaper. Faster means lower latency AND lower cost.

At first this may seem counterintuitive. A high-end sports car will be much faster than a standard family sedan, but the family sedan may be much cheaper. Cheaper to buy, and cheaper to run. But massively parallel software running on commodity hardware is a quite different type of product from a car. In general, the faster it goes, the cheaper it is to run.

Time Is Money
As has been noted many times in the history of computing, if you are a factor of 50x slower, then you will need 50x more nodes to run at the same speed (even assuming perfect parallelization), or your computation will need 50x more time. In either case, it will also be much more likely that you will experience at least one of your nodes crashing during a computation. This is not to argue that automatic fault tolerance and recovery should be ignored in the pursuit of speed, but rather that these two factors need to be carefully balanced. Good design in massively parallel systems is about achieving maximum speed along with the ability to recover from a given expected level of hardware failure, via checkpointing.

The key phrase here is "a given expected level of hardware failure". In certain types of peer-to-peer services which take advantage of idle PC capacity, it is necessary to assume that all machines are extremely unreliable and may go offline at any time. However, in a commercial big data cluster it may be reasonably asssumed that almost all machines will be available almost all of the time. This means that a much more optimistic point in the design space can be chosen, one which is designed much more for speed than for pathological failure scenarios.

The MapReduce model is an example of a model where speed has been sacrificed in a major way in order to achieve scalability on very unreliable hardware. As we have noted, while this is acceptable in certain types of free peer-to-peer services, it is much less acceptable in commercial big data systems deployed at scale.

Google, the inventors of the model, were the first to recognize the throughput and latency problems with the MapReduce model. To get the realtime performance they required, they recently replaced MapReduce in their Google Instant search engine.

The MapReduce model of Apache Hadoop is slow. In fact, it's very slow compared to, for example, the kinds of MPI or BSP clusters that have been routinely used in supercomputing for more than 15 years. On exactly the same hardware, MapReduce can be several orders of magnitude slower than MPI or BSP. By using MPI rather than MapReduce, HadoopBI gives customers the best possible big data solution, not only in terms of performance - massive throughput and extremely low latency - but also in terms of economics. HadoopBI is not just the fastest Big Data BI solution, it is also the cheapest at scale.

It's Free, But Is It Fast Enough?
Another frequently misunderstood element of big data economics concerns so-called "free" software. It has been argued by some that, since big data software needs to be run on many nodes, it is really important to have software that is free. Again this is an extreme oversimplification that ignores the dominant cost issues in big data economics. At large scale, software costs will in general be much smaller than hardware or cloud costs. And commercial software vendors should ensure that they are, if they want to stay in business.

Consider the following small-scale example. A company needs to process big data continuously in order to maximize competitive advantage. For simplicity, we will assume that the cost of running a single server (in-house or cloud) for one hour is $1, and that the company has a choice between two big data software systems - system A costs $1,000 per server and system B is free, but system A is 8x faster. Choosing system A, the company requires 5 servers, working continuously, to achieve the throughput required. However, if the company chooses system B, it will require 40 servers running continuously.

Simple arithmetic shows that within just six days, the initial cost of system A has been recovered, and from then on system A gives the company massive cost savings. Even if system A is only 2x or 3x faster and more efficient than system B, the initial cost will still be recovered in a matter of a few weeks.

The economic advantages of speed at scale are magnified even more in large-scale big data systems where, with volume licensing discounts, the payback time for super-fast software is even shorter.

The lesson of the above example is simple and very important. In parallel systems, speed at scale is king, as speed equates to efficiency, and efficiency equates to massive cost savings at scale. So, to be relevant for large scale production deployments, free parallel software has to be at least as fast and efficient as the best commercial software, otherwise the economics will be solidly against it. Some examples of free software, such as the Linux operating system, have achieved this goal. It remains to be seen whether this will also be the case with highly parallel big data software. In the meantime, it's important to remember that "free software is cheap, but fast software can be even cheaper".

More Stories By Bill McColl

Bill McColl left Oxford University to found Cloudscale. At Oxford he was Professor of Computer Science, Head of the Parallel Computing Research Center, and Chairman of the Computer Science Faculty. Along with Les Valiant of Harvard, he developed the BSP approach to parallel programming. He has led research, product, and business teams, in a number of areas: massively parallel algorithms and architectures, parallel programming languages and tools, datacenter virtualization, realtime stream processing, big data analytics, and cloud computing. He lives in Palo Alto, CA.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
Digital innovation is the next big wave of business transformation based on digital technologies of which IoT and Big Data are key components, For example: Business boundary innovation is a challenge to excavate third-party business value using IoT and BigData, like Nest Business structure innovation may propose re-building business structure from scratch, as Uber does in the taxicab industry The social model innovation is also a big challenge to the new social architecture with the design fr...
Data is an unusual currency; it is not restricted by the same transactional limitations as money or people. In fact, the more that you leverage your data across multiple business use cases, the more valuable it becomes to the organization. And the same can be said about the organization’s analytics. In his session at 19th Cloud Expo, Bill Schmarzo, CTO for the Big Data Practice at EMC, will introduce a methodology for capturing, enriching and sharing data (and analytics) across the organizati...
24Notion is full-service global creative digital marketing, technology and lifestyle agency that combines strategic ideas with customized tactical execution. With a broad understand of the art of traditional marketing, new media, communications and social influence, 24Notion uniquely understands how to connect your brand strategy with the right consumer. 24Notion ranked #12 on Corporate Social Responsibility - Book of List.
Whether they’re located in a public, private, or hybrid cloud environment, cloud technologies are constantly evolving. While the innovation is exciting, the end mission of delivering business value and rapidly producing incremental product features is paramount. In his session at @DevOpsSummit at 19th Cloud Expo, Kiran Chitturi, CTO Architect at Sungard AS, will discuss DevOps culture, its evolution of frameworks and technologies, and how it is achieving maturity. He will also cover various st...
Information technology is an industry that has always experienced change, and the dramatic change sweeping across the industry today could not be truthfully described as the first time we've seen such widespread change impacting customer investments. However, the rate of the change, and the potential outcomes from today's digital transformation has the distinct potential to separate the industry into two camps: Organizations that see the change coming, embrace it, and successful leverage it; and...
SYS-CON Events announced today that Sheng Liang to Keynote at SYS-CON's 19th Cloud Expo, which will take place on November 1-3, 2016 at the Santa Clara Convention Center in Santa Clara, California.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life sett...
One of biggest questions about Big Data is “How do we harness all that information for business use quickly and effectively?” Geographic Information Systems (GIS) or spatial technology is about more than making maps, but adding critical context and meaning to data of all types, coming from all different channels – even sensors. In his session at @ThingsExpo, William (Bill) Meehan, director of utility solutions for Esri, will take a closer look at the current state of spatial technology and ar...
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
The vision of a connected smart home is becoming reality with the application of integrated wireless technologies in devices and appliances. The use of standardized and TCP/IP networked wireless technologies in line-powered and battery operated sensors and controls has led to the adoption of radios in the 2.4GHz band, including Wi-Fi, BT/BLE and 802.15.4 applied ZigBee and Thread. This is driving the need for robust wireless coexistence for multiple radios to ensure throughput performance and th...
In his General Session at DevOps Summit, Asaf Yigal, Co-Founder & VP of Product at Logz.io, will explore the value of Kibana 4 for log analysis and will give a real live, hands-on tutorial on how to set up Kibana 4 and get the most out of Apache log files. He will examine three use cases: IT operations, business intelligence, and security and compliance. This is a hands-on session that will require participants to bring their own laptops, and we will provide the rest.
SYS-CON Events announced today that Bsquare has been named “Silver Sponsor” of SYS-CON's @ThingsExpo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. For more than two decades, Bsquare has helped its customers extract business value from a broad array of physical assets by making them intelligent, connecting them, and using the data they generate to optimize business processes.
In this strange new world where more and more power is drawn from business technology, companies are effectively straddling two paths on the road to innovation and transformation into digital enterprises. The first path is the heritage trail – with “legacy” technology forming the background. Here, extant technologies are transformed by core IT teams to provide more API-driven approaches. Legacy systems can restrict companies that are transitioning into digital enterprises. To truly become a lea...
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - comp...
SYS-CON Events announced today the Enterprise IoT Bootcamp, being held November 1-2, 2016, in conjunction with 19th Cloud Expo | @ThingsExpo at the Santa Clara Convention Center in Santa Clara, CA. Combined with real-world scenarios and use cases, the Enterprise IoT Bootcamp is not just based on presentations but with hands-on demos and detailed walkthroughs. We will introduce you to a variety of real world use cases prototyped using Arduino, Raspberry Pi, BeagleBone, Spark, and Intel Edison. Y...
Just over a week ago I received a long and loud sustained applause for a presentation I delivered at this year’s Cloud Expo in Santa Clara. I was extremely pleased with the turnout and had some very good conversations with many of the attendees. Over the next few days I had many more meaningful conversations and was not only happy with the results but also learned a few new things. Here is everything I learned in those three days distilled into three short points.
What are the new priorities for the connected business? First: businesses need to think differently about the types of connections they will need to make – these span well beyond the traditional app to app into more modern forms of integration including SaaS integrations, mobile integrations, APIs, device integration and Big Data integration. It’s important these are unified together vs. doing them all piecemeal. Second, these types of connections need to be simple to design, adapt and configure...
Why do your mobile transformations need to happen today? Mobile is the strategy that enterprise transformation centers on to drive customer engagement. In his general session at @ThingsExpo, Roger Woods, Director, Mobile Product & Strategy – Adobe Marketing Cloud, covered key IoT and mobile trends that are forcing mobile transformation, key components of a solid mobile strategy and explored how brands are effectively driving mobile change throughout the enterprise.
“We're a global managed hosting provider. Our core customer set is a U.S.-based customer that is looking to go global,” explained Adam Rogers, Managing Director at ANEXIA, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Adobe is changing the world though digital experiences. Adobe helps customers develop and deliver high-impact experiences that differentiate brands, build loyalty, and drive revenue across every screen, including smartphones, computers, tablets and TVs. Adobe content solutions are used daily by millions of companies worldwide-from publishers and broadcasters, to enterprises, marketing agencies and household-name brands. Building on its established design leadership, Adobe enables customers not o...