Welcome!

@CloudExpo Authors: Yeshim Deniz, Liz McMillan, Elizabeth White, Pat Romanski, Zakia Bouachraoui

Related Topics: @CloudExpo, Cognitive Computing

@CloudExpo: Blog Post

Google Dumps MapReduce

Batch processing systems like MapReduce and Hadoop are too slow for the new era of "realtime big data"

Over the past five years, MapReduce and Hadoop have been widely used for processing big data from the web, both in-house and in the cloud. However, we are now in an era where news, search, marketing, commerce and many other key aspects of the web are becoming much more social, more mobile, and more realtime. In response to these changes, major web companies are realizing that the "big data analytics" that is driving many of their services needs to be radically changed in order to move it into this realtime era. No company sees this more clearly than Google, the company that originally developed the MapReduce/Hadoop approach to processing big data.

This week, the company unveiled Google Instant, their new realtime search system. Until recently, the indexing system for Google Search was the company's largest MapReduce application. But with the need to move to realtime search, it has now been replaced. As reported here, Google noted that

  • "MapReduce isn't suited to calculations that need to occur in near real-time"

and that

  • "You can't do anything with it that takes a relatively short amount of time, so we got rid of it"

Another article notes

  • "The challenge for Google has been how to support a real-time world when the core of their search technology, the famous MapReduce, is batch oriented. Simple, they got rid of MapReduce... MapReduce still excels as a general query mechanism against masses of data, but real-time search requires a very specialized tool"

We are now at the start of a new era in the big data world. Increasingly, big data apps will need to be realtime. For example, a recent list of "Ten Hadoop-able Problems" contains the following examples of big data problems that can be tackled with MapReduce/Hadoop:

  • Risk Analysis
  • Customer Churn
  • Recommendation Engines
  • Ad Targeting
  • Sales Analysis
  • Network Analysis
  • Fraud Detection
  • Trading Surveillance
  • Search Quality
  • General Data Analytics

In each case, it is clear that these are big data problems where the ability to deliver the results of the analytics in realtime would increase the value of that analytics enormously.

At Cloudscale we've developed the first Realtime Data Warehouse, a new architecture aimed at delivering big data analytics in realtime - with latency in seconds instead of hours. The above ten areas are examples of the kinds of problems that can now be analyzed in realtime. There are, of course, new areas that can be tackled with a Realtime Data Warehouse that are not possible at all using offline batch processing analytics systems such as MapReduce and Hadoop. These include:

  • Realtime Location Analytics
  • Realtime Game Analytics
  • Realtime Algorithmic Trading
  • Realtime Government Intelligence
  • Realtime Sensor Systems and Grids

As we move beyond MapReduce and Hadoop, into this new era of "realtime big data", where analytics apps are "always-on" and run continuously, we can expect to see a major wave of software innovation, with many exciting new realtime apps from developers in areas such as marketing intelligence, social commerce, social enterprise, and the mobile web.

More Stories By Bill McColl

Bill McColl left Oxford University to found Cloudscale. At Oxford he was Professor of Computer Science, Head of the Parallel Computing Research Center, and Chairman of the Computer Science Faculty. Along with Les Valiant of Harvard, he developed the BSP approach to parallel programming. He has led research, product, and business teams, in a number of areas: massively parallel algorithms and architectures, parallel programming languages and tools, datacenter virtualization, realtime stream processing, big data analytics, and cloud computing. He lives in Palo Alto, CA.

CloudEXPO Stories
For enterprises to maintain business competitiveness in the digital economy, IT modernization is required. And cloud, with its on-demand, elastic and scalable principles has resoundingly been identified as the infrastructure model capable of supporting fast-changing business requirements that enterprises are challenged with, as a result of our increasingly connected world. In fact, Gartner states that by 2022, 28% of enterprise IT spending will have shifted to cloud. But enterprises still must determine which clouds are best suited for each application, in order to achieve IT governance, while accounting for complex data privacy requirements. It's safe to say that enterprises know their future looks cloudy, and that this infrastructure will soon become a mix of multi, hybrid, and on-prem enterprise clouds.
Cloud-Native thinking and Serverless Computing are now the norm in financial services, manufacturing, telco, healthcare, transportation, energy, media, entertainment, retail and other consumer industries, as well as the public sector. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential. DevOpsSUMMIT at CloudEXPO expands the DevOps community, enable a wide sharing of knowledge, and educate delegates and technology providers alike.
The term "digital transformation" (DX) is being used by everyone for just about any company initiative that involves technology, the web, ecommerce, software, or even customer experience. While the term has certainly turned into a buzzword with a lot of hype, the transition to a more connected, digital world is real and comes with real challenges. In his opening keynote, Four Essentials To Become DX Hero Status Now, Jonathan Hoppe, Co-Founder and CTO of Total Uptime Technologies, shared that beyond the hype, digital transformation initiatives are infusing IT budgets with critical investment for technology. This is shifting the IT organization from a cost center/center of efficiency to one that is strategic for revenue growth. CIOs are working with the new reality of cloud, mobile-first, and digital initiatives across all areas of their businesses. What's more, top IT talent wants to w...
While a hybrid cloud can ease that transition, designing and deploy that hybrid cloud still offers challenges for organizations concerned about lack of available cloud skillsets within their organization. Managed service providers offer a unique opportunity to fill those gaps and get organizations of all sizes on a hybrid cloud that meets their comfort level, while delivering enhanced benefits for cost, efficiency, agility, mobility, and elasticity.
Public clouds dominate IT conversations but the next phase of cloud evolutions are "multi" hybrid cloud environments. The winners in the cloud services industry will be those organizations that understand how to leverage these technologies as complete service solutions for specific customer verticals. In turn, both business and IT actors throughout the enterprise will need to increase their engagement with multi-cloud deployments today while planning a technology strategy that will constitute a significant part of their IT budgets in the very near future. As IoT solutions are growing rapidly, as well as security challenges growing exponentially, without a doubt, the cloud world is about to change for the better. Again.