Welcome!

@CloudExpo Authors: Yeshim Deniz, William Schmarzo, Carmen Gonzalez, Elizabeth White, Liz McMillan

Related Topics: @CloudExpo, Open Source Cloud, Apache

@CloudExpo: Article

Hadoop and Realtime Cloud Computing

Architectures such as MapReduce and Hadoop are good for batch processing of big data, but bad for realtime processing

Big data is creating a massive disruption for the IT industry. Faced with exponentially growing data volumes in every area of business and the web, companies around the world are looking beyond their current databases and data warehouses for new ways to handle this data deluge.

Taking a lead from Google, a number of organizations have been exploring the potential of MapReduce, and its open source clone Hadoop, for big data processing. The MapReduce/Hadoop approach is based around the idea that what's needed is not database processing with SQL queries, but rather dataflow computing with simple parallel programming primitives such as map and reduce.

As Google and others have shown, this kind of basic dataflow programming model can be implemented as a coarse-grain set of parallel tasks that can be run across hundreds or thousands of machines, to carry out large-scale batch processing on massive data sets.

Google themselves have been using MapReduce for batch processing for over six years, and others, such as Facebook, eBay and Yahoo have been using Hadoop for the same kind of batch processing for several years now. So today, parallel dataflow is firmly established as an alternative to databases and data warehouses for offline batch processing of big data. But now the game is changing again...

In recent months, Google has realized that the web is now entering a new era, the realtime era, and that batch processing systems such as MapReduce and Hadoop cannot deliver performance anywhere near the speed required for new realtime services such as Google Instant. Google noted that

  • "MapReduce isn't suited to calculations that need to occur in near real-time"

and that

  • "You can't do anything with it that takes a relatively short amount of time, so we got rid of it"

Other industry leaders, such as Jeff Jonas, Chief Scientist for Analytics at IBM, have made similar remarks in recent weeks. In his recent video "Big Thoughts on Big Data", Jonas notes that with only batch processing tools to handle it, organizations grappling with a relentless avalanche of realtime data will get dumber over time rather than getting smarter.

  • "The idea of waiting for a batch job to run doesn't cut it. Instead, how can an organization make sense of what it knows, as a transaction is happening, so that it can do something about it right then"
  • "I'm not a big fan of batch processes... I've never seen a batch system grow up an become a realtime streaming system, but you can take a realtime streaming system and make it eat batches all day long"
  • "I like Hadoop but it's meant for batch activities. That's not the kind of back-end you would use for realtime sense-making systems"

So coarse-grain dataflow architectures such as Hadoop are good for batch, but bad for realtime.

To power realtime big data apps we need a completely new type of fine-grain dataflow architecture. An architecture that can, for example, continuously analyze a stream of events at a rate of say one million events per second per server, and deliver results with a maximum latency of five seconds between data in and analytics out. At Cloudscale we set out to crack this major technical problem, and to build the world's first "realtime data warehouse". The linearly scalable Cloudscale parallel dataflow architecture not only delivers game-changing realtime performance on commodity hardware, but also, as Jeff Jonas notes above "can eat batches all day long" like a traditional MapReduce or Hadoop architecture. There isn't really an established name yet for such a system. I guess we could call it a "Redoop" architecture (Realtime Dataflow on Ordinary Processors, or Realtime Hadoop).

More Stories By Bill McColl

Bill McColl left Oxford University to found Cloudscale. At Oxford he was Professor of Computer Science, Head of the Parallel Computing Research Center, and Chairman of the Computer Science Faculty. Along with Les Valiant of Harvard, he developed the BSP approach to parallel programming. He has led research, product, and business teams, in a number of areas: massively parallel algorithms and architectures, parallel programming languages and tools, datacenter virtualization, realtime stream processing, big data analytics, and cloud computing. He lives in Palo Alto, CA.

@CloudExpo Stories
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
SYS-CON Events announced today that Tintri, Inc, a leading provider of enterprise cloud infrastructure, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Tintri offers an enterprise cloud platform built with public cloud-like web services and RESTful APIs. Organizations use Tintri all-flash storage with scale-out and automation as a foundation for their own clouds – to build agile development environments...
SYS-CON Events announced today that DivvyCloud will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. DivvyCloud software enables organizations to achieve their cloud computing goals by simplifying and automating security, compliance and cost optimization of public and private cloud infrastructure. Using DivvyCloud, customers can leverage programmatic Bots to identify and remediate common cloud problems in rea...
SYS-CON Events announced today that Carbonite will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Carbonite protects your entire IT footprint with the right level of protection for each workload, ensuring lower costs and dependable solutions with DoubleTake and Evault.
SYS-CON Events announced today that Technologic Systems Inc., an embedded systems solutions company, will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Technologic Systems is an embedded systems company with headquarters in Fountain Hills, Arizona. They have been in business for 32 years, helping more than 8,000 OEM customers and building over a hundred COTS products that have never been discontinued. Technologic Systems’ pr...
SYS-CON Events announced today that Tappest will exhibit MooseFS at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. MooseFS is a breakthrough concept in the storage industry. It allows you to secure stored data with either duplication or erasure coding using any server. The newest – 4.0 version of the software enables users to maintain the redundancy level with even 50% less hard drive space required. The software func...
SYS-CON Events announced today that EARP will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. "We are a software house, so we perfectly understand challenges that other software houses face in their projects. We can augment a team, that will work with the same standards and processes as our partners' internal teams. Our teams will deliver the same quality within the required time and budget just as our partn...
SYS-CON Events announced today that Cloudistics, an on-premises cloud computing company, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Cloudistics delivers a complete public cloud experience with composable on-premises infrastructures to medium and large enterprises. Its software-defined technology natively converges network, storage, compute, virtualization, and management into a ...
Cloud promises the agility required by today’s digital businesses. As organizations adopt cloud based infrastructures and services, their IT resources become increasingly dynamic and hybrid in nature. Managing these require modern IT operations and tools. In his session at 20th Cloud Expo, Raj Sundaram, Senior Principal Product Manager at CA Technologies, will discuss how to modernize your IT operations in order to proactively manage your hybrid cloud and IT environments. He will be sharing bes...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists will examine how DevOps helps to meet th...
DevOps at Cloud Expo – being held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real r...
As DevOps methodologies expand their reach across the enterprise, organizations face the daunting challenge of adapting related cloud strategies to ensure optimal alignment, from managing complexity to ensuring proper governance. How can culture, automation, legacy apps and even budget be reexamined to enable this ongoing shift within the modern software factory?
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software in the hope of capturing value in IoT. Although IoT is relatively new in the market, it has already gone through many promotional terms such as IoE, IoX, SDX, Edge/Fog, Mist Compute, etc. Ultimately, irrespective of the name, it is about deriving value from independent software assets participating in an ecosystem as one comprehensive solution.
SYS-CON Events announced today that A&I Solutions has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 1999, A&I Solutions is a leading information technology (IT) software and services provider focusing on best-in-class enterprise solutions. By partnering with industry leaders in technology, A&I assures customers high performance levels across all IT environments including: mai...
SYS-CON Events announced today that Systena America will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Systena Group has been in business for various software development and verification in Japan, US, ASEAN, and China by utilizing the knowledge we gained from all types of device development for various industries including smartphones (Android/iOS), wireless communication, security technology and IoT serv...
Every successful software product evolves from an idea to an enterprise system. Notably, the same way is passed by the product owner's company. In his session at 20th Cloud Expo, Oleg Lola, CEO of MobiDev, will provide a generalized overview of the evolution of a software product, the product owner, the needs that arise at various stages of this process, and the value brought by a software development partner to the product owner as a response to these needs.
While some vendors scramble to create and sell you a fancy solution for monitoring your spanking new Amazon Lambdas, hear how you can do it on the cheap using just built-in Java APIs yourself. By exploiting a little-known fact that Lambdas aren’t exactly single threaded, you can effectively identify hot spots in your serverless code. In his session at 20th Cloud Expo, David Martin, Principal Product Owner at CA Technologies, will give a live demonstration and code walkthrough, showing how to ov...