Welcome!

@CloudExpo Authors: Pat Romanski, Liz McMillan, Elizabeth White, Yeshim Deniz, Zakia Bouachraoui

Related Topics: @CloudExpo, Cognitive Computing

@CloudExpo: Blog Post

Google Dumps MapReduce

Batch processing systems like MapReduce and Hadoop are too slow for the new era of "realtime big data"

Over the past five years, MapReduce and Hadoop have been widely used for processing big data from the web, both in-house and in the cloud. However, we are now in an era where news, search, marketing, commerce and many other key aspects of the web are becoming much more social, more mobile, and more realtime. In response to these changes, major web companies are realizing that the "big data analytics" that is driving many of their services needs to be radically changed in order to move it into this realtime era. No company sees this more clearly than Google, the company that originally developed the MapReduce/Hadoop approach to processing big data.

This week, the company unveiled Google Instant, their new realtime search system. Until recently, the indexing system for Google Search was the company's largest MapReduce application. But with the need to move to realtime search, it has now been replaced. As reported here, Google noted that

  • "MapReduce isn't suited to calculations that need to occur in near real-time"

and that

  • "You can't do anything with it that takes a relatively short amount of time, so we got rid of it"

Another article notes

  • "The challenge for Google has been how to support a real-time world when the core of their search technology, the famous MapReduce, is batch oriented. Simple, they got rid of MapReduce... MapReduce still excels as a general query mechanism against masses of data, but real-time search requires a very specialized tool"

We are now at the start of a new era in the big data world. Increasingly, big data apps will need to be realtime. For example, a recent list of "Ten Hadoop-able Problems" contains the following examples of big data problems that can be tackled with MapReduce/Hadoop:

  • Risk Analysis
  • Customer Churn
  • Recommendation Engines
  • Ad Targeting
  • Sales Analysis
  • Network Analysis
  • Fraud Detection
  • Trading Surveillance
  • Search Quality
  • General Data Analytics

In each case, it is clear that these are big data problems where the ability to deliver the results of the analytics in realtime would increase the value of that analytics enormously.

At Cloudscale we've developed the first Realtime Data Warehouse, a new architecture aimed at delivering big data analytics in realtime - with latency in seconds instead of hours. The above ten areas are examples of the kinds of problems that can now be analyzed in realtime. There are, of course, new areas that can be tackled with a Realtime Data Warehouse that are not possible at all using offline batch processing analytics systems such as MapReduce and Hadoop. These include:

  • Realtime Location Analytics
  • Realtime Game Analytics
  • Realtime Algorithmic Trading
  • Realtime Government Intelligence
  • Realtime Sensor Systems and Grids

As we move beyond MapReduce and Hadoop, into this new era of "realtime big data", where analytics apps are "always-on" and run continuously, we can expect to see a major wave of software innovation, with many exciting new realtime apps from developers in areas such as marketing intelligence, social commerce, social enterprise, and the mobile web.

More Stories By Bill McColl

Bill McColl left Oxford University to found Cloudscale. At Oxford he was Professor of Computer Science, Head of the Parallel Computing Research Center, and Chairman of the Computer Science Faculty. Along with Les Valiant of Harvard, he developed the BSP approach to parallel programming. He has led research, product, and business teams, in a number of areas: massively parallel algorithms and architectures, parallel programming languages and tools, datacenter virtualization, realtime stream processing, big data analytics, and cloud computing. He lives in Palo Alto, CA.

CloudEXPO Stories
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve full cloud literacy in the enterprise world.
Wasabi is the hot cloud storage company delivering low-cost, fast, and reliable cloud storage. Wasabi is 80% cheaper and 6x faster than Amazon S3, with 100% data immutability protection and no data egress fees. Created by Carbonite co-founders and cloud storage pioneers David Friend and Jeff Flowers, Wasabi is on a mission to commoditize the storage industry. Wasabi is a privately held company based in Boston, MA. Follow and connect with Wasabi on Twitter, Facebook, Instagram and the Wasabi blog.
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to advisory roles at startups. He has worked extensively on monetization, SAAS, IoT, ecosystems, partnerships and accelerating growth in new business initiatives.
The dream is universal: heuristic driven, global business operations without interruption so that nobody has to wake up at 4am to solve a problem. Building upon Nutanix Acropolis software defined storage, virtualization, and networking platform, Mark will demonstrate business lifecycle automation with freedom of choice and consumption models. Hybrid cloud applications and operations are controllable by the Nutanix Prism control plane with Calm automation, which can weave together the following: database as a service with Era, micro segmentation with Flow, event driven lifecycle operations with Epoch monitoring, and both financial and cloud governance with Beam. Combined together, the Nutanix Enterprise Cloud OS democratizes and accelerates every aspect of your business with simplicity, security, and scalability.
Inzata is a powerful, revolutionary data analytics platform for integrating, exploring, and analyzing data of any kind, from any source, at massive scale. Powerful AI-assisted Modeling and a patented analytics engine help users quickly load, blend and model raw and unstructured data into powerful enterprise data models, actionable real-time analytics and engaging visualizations. Go beyond spreadsheets and slides and compose a powerful narrative about how your business is performing, and how you could make it better.