Welcome!

Cloud Expo Authors: Michelle Drolet, Kevin Benedict, Pat Romanski, Elizabeth White, Paige Leidig

Related Topics: Cloud Expo

Cloud Expo: Blog Post

The Economics of Big Data: Why Faster Software is Cheaper

Faster means better and cheaper - lower latency and lower cost!

In big data computing, and more generally in all commercial highly parallel software systems, speed matters more than just about anything else. The reason is straightforward, and has been known for decades.

Put very simply, when it comes to massively parallel software of the kind need to handle big data, fast is both better AND cheaper. Faster means lower latency AND lower cost.

At first this may seem counterintuitive. A high-end sports car will be much faster than a standard family sedan, but the family sedan may be much cheaper. Cheaper to buy, and cheaper to run. But massively parallel software running on commodity hardware is a quite different type of product from a car. In general, the faster it goes, the cheaper it is to run.

Time Is Money
As has been noted many times in the history of computing, if you are a factor of 50x slower, then you will need 50x more nodes to run at the same speed (even assuming perfect parallelization), or your computation will need 50x more time. In either case, it will also be much more likely that you will experience at least one of your nodes crashing during a computation. This is not to argue that automatic fault tolerance and recovery should be ignored in the pursuit of speed, but rather that these two factors need to be carefully balanced. Good design in massively parallel systems is about achieving maximum speed along with the ability to recover from a given expected level of hardware failure, via checkpointing.

The key phrase here is "a given expected level of hardware failure". In certain types of peer-to-peer services which take advantage of idle PC capacity, it is necessary to assume that all machines are extremely unreliable and may go offline at any time. However, in a commercial big data cluster it may be reasonably asssumed that almost all machines will be available almost all of the time. This means that a much more optimistic point in the design space can be chosen, one which is designed much more for speed than for pathological failure scenarios.

The MapReduce model is an example of a model where speed has been sacrificed in a major way in order to achieve scalability on very unreliable hardware. As we have noted, while this is acceptable in certain types of free peer-to-peer services, it is much less acceptable in commercial big data systems deployed at scale.

Google, the inventors of the model, were the first to recognize the throughput and latency problems with the MapReduce model. To get the realtime performance they required, they recently replaced MapReduce in their Google Instant search engine.

The MapReduce model of Apache Hadoop is slow. In fact, it's very slow compared to, for example, the kinds of MPI or BSP clusters that have been routinely used in supercomputing for more than 15 years. On exactly the same hardware, MapReduce can be several orders of magnitude slower than MPI or BSP. By using MPI rather than MapReduce, HadoopBI gives customers the best possible big data solution, not only in terms of performance - massive throughput and extremely low latency - but also in terms of economics. HadoopBI is not just the fastest Big Data BI solution, it is also the cheapest at scale.

It's Free, But Is It Fast Enough?
Another frequently misunderstood element of big data economics concerns so-called "free" software. It has been argued by some that, since big data software needs to be run on many nodes, it is really important to have software that is free. Again this is an extreme oversimplification that ignores the dominant cost issues in big data economics. At large scale, software costs will in general be much smaller than hardware or cloud costs. And commercial software vendors should ensure that they are, if they want to stay in business.

Consider the following small-scale example. A company needs to process big data continuously in order to maximize competitive advantage. For simplicity, we will assume that the cost of running a single server (in-house or cloud) for one hour is $1, and that the company has a choice between two big data software systems - system A costs $1,000 per server and system B is free, but system A is 8x faster. Choosing system A, the company requires 5 servers, working continuously, to achieve the throughput required. However, if the company chooses system B, it will require 40 servers running continuously.

Simple arithmetic shows that within just six days, the initial cost of system A has been recovered, and from then on system A gives the company massive cost savings. Even if system A is only 2x or 3x faster and more efficient than system B, the initial cost will still be recovered in a matter of a few weeks.

The economic advantages of speed at scale are magnified even more in large-scale big data systems where, with volume licensing discounts, the payback time for super-fast software is even shorter.

The lesson of the above example is simple and very important. In parallel systems, speed at scale is king, as speed equates to efficiency, and efficiency equates to massive cost savings at scale. So, to be relevant for large scale production deployments, free parallel software has to be at least as fast and efficient as the best commercial software, otherwise the economics will be solidly against it. Some examples of free software, such as the Linux operating system, have achieved this goal. It remains to be seen whether this will also be the case with highly parallel big data software. In the meantime, it's important to remember that "free software is cheap, but fast software can be even cheaper".

More Stories By Bill McColl

Bill McColl left Oxford University to found Cloudscale. At Oxford he was Professor of Computer Science, Head of the Parallel Computing Research Center, and Chairman of the Computer Science Faculty. Along with Les Valiant of Harvard, he developed the BSP approach to parallel programming. He has led research, product, and business teams, in a number of areas: massively parallel algorithms and architectures, parallel programming languages and tools, datacenter virtualization, realtime stream processing, big data analytics, and cloud computing. He lives in Palo Alto, CA.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Cloud Expo Breaking News
Cloud backup and recovery services are critical to safeguarding an organization’s data and ensuring business continuity when technical failures and outages occur. With so many choices, how do you find the right provider for your specific needs? In his session at 14th Cloud Expo, Daniel Jacobson, Technology Manager at BUMI, will outline the key factors including backup configurations, proactive monitoring, data restoration, disaster recovery drills, security, compliance and data center resources. Aside from the technical considerations, the secret sauce in identifying the best vendor is the level of focus, expertise and specialization of their engineering team and support group, and how they monitor your day-to-day backups, provide recommendations, and guide you through restores when necessary.
SYS-CON Events announced today that SherWeb, a long-time leading provider of cloud services and Microsoft's 2013 World Hosting Partner of the Year, will exhibit at SYS-CON's 14th International Cloud Expo®, which will take place on June 10–12, 2014, at the Javits Center in New York City, New York. A worldwide hosted services leader ranking in the prestigious North American Deloitte Technology Fast 500TM, and Microsoft's 2013 World Hosting Partner of the Year, SherWeb provides competitive cloud solutions to businesses and partners around the world. Founded in 1998, SherWeb is a privately owned company headquartered in Quebec, Canada. Its service portfolio includes Microsoft Exchange, SharePoint, Lync, Dynamics CRM and more.
The world of cloud and application development is not just for the hardened developer these days. In their session at 14th Cloud Expo, Phil Jackson, Development Community Advocate for SoftLayer, and Harold Hannon, Sr. Software Architect at SoftLayer, will pull back the curtain of the architecture of a fun demo application purpose-built for the cloud. They will focus on demonstrating how they leveraged compute, storage, messaging, and other cloud elements hosted at SoftLayer to lower the effort and difficulty of putting together a useful application. This will be an active demonstration and review of simple command-line tools and resources, so don’t be afraid if you are not a seasoned developer.
You use an agile process; your goal is to make your organization more agile. What about your data infrastructure? The truth is, today’s databases are anything but agile – they are effectively static repositories that are cumbersome to work with, difficult to change, and cannot keep pace with application demands. Performance suffers as a result, and it takes far longer than it should to deliver on new features and capabilities needed to make your organization competitive. As your application and business needs change, data repositories and structures get outmoded rapidly, resulting in increased work for application developers and slow performance for end users. Further, as data sizes grow into the Big Data realm, this problem is exacerbated and becomes even more difficult to address. A seemingly simple schema change can take hours (or more) to perform, and as requirements evolve the disconnect between existing data structures and actual needs diverge.
Cloud scalability and performance should be at the heart of every successful Internet venture. The infrastructure needs to be resilient, flexible, and fast – it’s best not to get caught thinking about architecture until the middle of an emergency, when it's too late. In his interactive, no-holds-barred session at 14th Cloud Expo, Phil Jackson, Development Community Advocate for SoftLayer, will dive into how to design and build-out the right cloud infrastructure.
SYS-CON Events announced today that BUMI, a premium managed service provider specializing in data backup and recovery, will exhibit at SYS-CON's 14th International Cloud Expo®, which will take place on June 10–12, 2014, at the Javits Center in New York City, New York. Manhattan-based BUMI (Backup My Info!) is a premium managed service provider specializing in data backup and recovery. Founded in 2002, the company’s Here, There and Everywhere data backup and recovery solutions are utilized by more than 500 businesses. BUMI clients include professional service organizations such as banking, financial, insurance, accounting, hedge funds and law firms. The company is known for its relentless passion for customer service and support, and has won numerous awards, including Customer Service Provider of the Year and 10 Best Companies to Work For.
Chief Security Officers (CSO), CIOs and IT Directors are all concerned with providing a secure environment from which their business can innovate and customers can safely consume without the fear of Distributed Denial of Service attacks. To be successful in today's hyper-connected world, the enterprise needs to leverage the capabilities of the web and be ready to innovate without fear of DDoS attacks, concerns about application security and other threats. Organizations face great risk from increasingly frequent and sophisticated attempts to render web properties unavailable, and steal intellectual property or personally identifiable information. Layered security best practices extend security beyond the data center, delivering DDoS protection and maintaining site performance in the face of fast-changing threats.
From data center to cloud to the network. In his session at 3rd SDDC Expo, Raul Martynek, CEO of Net Access, will identify the challenges facing both data center providers and enterprise IT as they relate to cross-platform automation. He will then provide insight into designing, building, securing and managing the technology as an integrated service offering. Topics covered include: High-density data center design Network (and SDN) integration and automation Cloud (and hosting) infrastructure considerations Monitoring and security Management approaches Self-service and automation
In his session at 14th Cloud Expo, David Holmes, Vice President at OutSystems, will demonstrate the immense power that lives at the intersection of mobile apps and cloud application platforms. Attendees will participate in a live demonstration – an enterprise mobile app will be built and changed before their eyes – on their own devices. David Holmes brings over 20 years of high-tech marketing leadership to OutSystems. Prior to joining OutSystems, he was VP of Global Marketing for Damballa, a leading provider of network security solutions. Previously, he was SVP of Global Marketing for Jacada where his branding and positioning expertise helped drive the company from start-up days to a $55 million initial public offering on Nasdaq.
Performance is the intersection of power, agility, control, and choice. If you value performance, and more specifically consistent performance, you need to look beyond simple virtualized compute. Many factors need to be considered to create a truly performant environment. In his General Session at 14th Cloud Expo, Marc Jones, Vice President of Product Innovation for SoftLayer, will explain how to take advantage of a multitude of compute options and platform features to make cloud the cornerstone of your online presence.
Are you interested in accelerating innovation, simplifying deployments, reducing complexity, and lowering development costs? The cloud is changing the face of application development and deployment, with enterprise-grade infrastructure and platform services making it possible for you to build and rapidly scale enterprise applications. In his session at 14th Cloud Expo, Gene Eun, Sr. Director, Oracle Cloud at Oracle, will discuss the latest solutions and strategies for application developers and enterprise IT organizations to leverage Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) to build and deploy modern business applications in the cloud.
Hybrid cloud refers to the federation of a public and private cloud environment for the purpose of extending the elastic and flexibility of compute, storage and network capabilities, in an on-demand, pay-as-you go basis. The hybrid approach allows a business to take advantage of the scalability and cost-effectiveness that a public cloud computing environment offers without exposing mission-critical applications and data to third-party vulnerabilities. Hybrid cloud environments involve complex management challenges. First, organizations struggle to maintain control over the resources that lie outside of their managed IT scope. They also need greater infrastructure visibility to help reduce maintenance costs and ensure that their company data and resources are properly handled and secured.
As more applications and services move "to the cloud" (public or on-premise), cloud environments are increasingly adopting and building out traditional enterprise features. This in turn is enabling and encouraging cloud adoption from enterprise users. In many ways the definition is blurring as features like continuous operation, geo-distribution or on-demand capacity become the norm. At NuoDB we're involved in both building enterprise software and using enterprise cloud capabilities. In his session at 14th Cloud Expo, Seth Proctor, CTO of NuoDB, Inc., will cover experiences from building, deploying and using enterprise services and suggest some ways to approach moving enterprise applications into a cloud model.
Understanding the future of Big Data is crucial in the early stages of decision making around Big Data architectures. In the enterprise, what stands out is the need to integrate Hadoop smoothly into your existing data warehouse architecture, while taking advantage of existing skills and investments. In his General Session at 14th Cloud Expo, Marty Gubar, Director of Product Management at Oracle, will present a strategy for enabling integrated data management using both Hadoop and relational technologies. In particular, he'll look at how SQL, long the standard for the data warehouse, is increasingly being used on Hadoop. The real prize, though, is Smart SQL processing, seamlessly integrating the data warehouse and Hadoop into a single, Big Data Management System.
The time has come for humanity’s first interstellar trek to Terranuvem, the cloud planet, and Chief Engineer Cyrus Agarwal has been chosen to ready a ship for the voyage. He must make the right architectural choices to transform the ship for the long journey and be prepared for the unknown. He will be tested and overcome challenges during the mission. Join Cyrus and the crew of the Stratus at Oracle VP Rex Wang’s Day 2 Keynote at 14th Cloud Expo for a unique, sci-fi movie experience while learning key success factors for your own journey to cloud.