Welcome!

Cloud Expo Authors: Elizabeth White, Carmen Gonzalez, JP Morgenthal, Liz McMillan, Pat Romanski

Related Topics: Cloud Expo, Open Source, Apache

Cloud Expo: Article

Hadoop and Realtime Cloud Computing

Architectures such as MapReduce and Hadoop are good for batch processing of big data, but bad for realtime processing

Big data is creating a massive disruption for the IT industry. Faced with exponentially growing data volumes in every area of business and the web, companies around the world are looking beyond their current databases and data warehouses for new ways to handle this data deluge.

Taking a lead from Google, a number of organizations have been exploring the potential of MapReduce, and its open source clone Hadoop, for big data processing. The MapReduce/Hadoop approach is based around the idea that what's needed is not database processing with SQL queries, but rather dataflow computing with simple parallel programming primitives such as map and reduce.

As Google and others have shown, this kind of basic dataflow programming model can be implemented as a coarse-grain set of parallel tasks that can be run across hundreds or thousands of machines, to carry out large-scale batch processing on massive data sets.

Google themselves have been using MapReduce for batch processing for over six years, and others, such as Facebook, eBay and Yahoo have been using Hadoop for the same kind of batch processing for several years now. So today, parallel dataflow is firmly established as an alternative to databases and data warehouses for offline batch processing of big data. But now the game is changing again...

In recent months, Google has realized that the web is now entering a new era, the realtime era, and that batch processing systems such as MapReduce and Hadoop cannot deliver performance anywhere near the speed required for new realtime services such as Google Instant. Google noted that

  • "MapReduce isn't suited to calculations that need to occur in near real-time"

and that

  • "You can't do anything with it that takes a relatively short amount of time, so we got rid of it"

Other industry leaders, such as Jeff Jonas, Chief Scientist for Analytics at IBM, have made similar remarks in recent weeks. In his recent video "Big Thoughts on Big Data", Jonas notes that with only batch processing tools to handle it, organizations grappling with a relentless avalanche of realtime data will get dumber over time rather than getting smarter.

  • "The idea of waiting for a batch job to run doesn't cut it. Instead, how can an organization make sense of what it knows, as a transaction is happening, so that it can do something about it right then"
  • "I'm not a big fan of batch processes... I've never seen a batch system grow up an become a realtime streaming system, but you can take a realtime streaming system and make it eat batches all day long"
  • "I like Hadoop but it's meant for batch activities. That's not the kind of back-end you would use for realtime sense-making systems"

So coarse-grain dataflow architectures such as Hadoop are good for batch, but bad for realtime.

To power realtime big data apps we need a completely new type of fine-grain dataflow architecture. An architecture that can, for example, continuously analyze a stream of events at a rate of say one million events per second per server, and deliver results with a maximum latency of five seconds between data in and analytics out. At Cloudscale we set out to crack this major technical problem, and to build the world's first "realtime data warehouse". The linearly scalable Cloudscale parallel dataflow architecture not only delivers game-changing realtime performance on commodity hardware, but also, as Jeff Jonas notes above "can eat batches all day long" like a traditional MapReduce or Hadoop architecture. There isn't really an established name yet for such a system. I guess we could call it a "Redoop" architecture (Realtime Dataflow on Ordinary Processors, or Realtime Hadoop).

More Stories By Bill McColl

Bill McColl left Oxford University to found Cloudscale. At Oxford he was Professor of Computer Science, Head of the Parallel Computing Research Center, and Chairman of the Computer Science Faculty. Along with Les Valiant of Harvard, he developed the BSP approach to parallel programming. He has led research, product, and business teams, in a number of areas: massively parallel algorithms and architectures, parallel programming languages and tools, datacenter virtualization, realtime stream processing, big data analytics, and cloud computing. He lives in Palo Alto, CA.

@CloudExpo Stories
SYS-CON Events announced today that SOA Software, an API management leader, will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. SOA Software is a leading provider of API Management and SOA Governance products that equip business to deliver APIs and SOA together to drive their company to meet its business strategy quickly and effectively. SOA Software’s technology helps businesses to accel...
Almost everyone sees the potential of Internet of Things but how can businesses truly unlock that potential. The key will be in the ability to discover business insight in the midst of an ocean of Big Data generated from billions of embedded devices via Systems of Discover. Businesses will also need to ensure that they can sustain that insight by leveraging the cloud for global reach, scale and elasticity.
SYS-CON Events announced today that Utimaco will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Utimaco is a leading manufacturer of hardware based security solutions that provide the root of trust to keep cryptographic keys safe, secure critical digital infrastructures and protect high value data assets. Only Utimaco delivers a general-purpose hardware security module (HSM) as a customiz...
How do APIs and IoT relate? The answer is not as simple as merely adding an API on top of a dumb device, but rather about understanding the architectural patterns for implementing an IoT fabric. There are typically two or three trends: Exposing the device to a management framework Exposing that management framework to a business centric logic • Exposing that business layer and data to end users. This last trend is the IoT stack, which involves a new shift in the separation of what stuff hap...
SYS-CON Events announced today that ElasticBox is holding a Hackathon at DevOps Summit, November 6 from 12 pm -4 pm at the Santa Clara Convention Center in Santa Clara, CA. You can enter as an individual or team of up to 10 developers. A New Star Is Born Every Month! All completed ElasticBoxes will then be sent to a judging panel - 12 winners will be featured on the ElasticBox website in 2015. All entrants will receive five full enterprise licenses for one year + ElasticBox headphones + Elasti...
Once the decision has been made to move part or all of a workload to the cloud, a methodology for selecting that workload needs to be established. How do you move to the cloud? What does the discovery, assessment and planning look like? What workloads make sense? Which cloud model makes sense for each workload? What are the considerations for how to select the right cloud model? And how does that fit in with the overall IT tranformation? In his session at 15th Cloud Expo, John Hatem, head of V...
Cloud services are the newest tool in the arsenal of IT products in the market today. These cloud services integrate process and tools. In order to use these products effectively, organizations must have a good understanding of themselves and their business requirements. In his session at 15th Cloud Expo, Brian Lewis, Principal Architect at Verizon Cloud, will outline key areas of organizational focus, and how to formalize an actionable plan when migrating applications and internal services to...
SAP is delivering break-through innovation combined with fantastic user experience powered by the market-leading in-memory technology, SAP HANA. In his General Session at 15th Cloud Expo, Thorsten Leiduck, VP ISVs & Digital Commerce, SAP, will discuss how SAP and partners provide cloud and hybrid cloud solutions as well as real-time Big Data offerings that help companies of all sizes and industries run better. SAP launched an application challenge to award the most innovative SAP HANA and SAP ...
Ixia develops amazing products so its customers can connect the world. Ixia helps its customers provide an always-on user experience through fast, secure delivery of dynamic connected technologies and services. Through actionable insights that accelerate and secure application and service delivery, Ixia's customers benefit from faster time to market, optimized application performance and higher-quality deployments.
SYS-CON Events announced today that Calm.io has been named “Bronze Sponsor” of DevOps Summit Silicon Valley, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Calm.io is a cloud orchestration platform for AWS, vCenter, OpenStack, or bare metal, that runs your CL tools puppet, Chef, shell, git, Jenkins, nagios, and will soon support New Relic and Docker. It can run hosted, or on premise and provides VM automation / expiry, self-service portals,...
In her General Session at 15th Cloud Expo, Anne Plese, Senior Consultant, Cloud Product Marketing, at Verizon Enterprise, will focus on finding the right mix of renting vs. buying Oracle capacity to scale to meet business demands, and offer validated Oracle database TCO models for Oracle development and testing environments. Anne Plese is a marketing and technology enthusiast/realist with over 19+ years in high tech. At Verizon Enterprise, she focuses on driving growth for the Verizon Cloud pla...
SYS-CON Events announced today that Aria Systems, the recurring revenue expert, has been named "Bronze Sponsor" of SYS-CON's 15th International Cloud Expo®, which will take place on November 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Aria Systems helps leading businesses connect their customers with the products and services they love. Industry leaders like Pitney Bowes, Experian, AAA NCNU, VMware, HootSuite and many others choose Aria to power their recurring revenue bu...
The Internet of Things (IoT) is going to require a new way of thinking and of developing software for speed, security and innovation. This requires IT leaders to balance business as usual while anticipating for the next market and technology trends. Cloud provides the right IT asset portfolio to help today’s IT leaders manage the old and prepare for the new. Today the cloud conversation is evolving from private and public to hybrid. This session will provide use cases and insights to reinforce t...
As Platform as a Service (PaaS) matures as a category, developers should have the ability to use the programming language of their choice to build applications and have access to a wide array of services. Bluemix is IBM's open cloud development platform that enables users to easily build cloud-based, creative mobile and web applications without having to spend large amounts of time and resources on configuring infrastructure and multiple software licenses. In this track, you will learn about the...
Blue Box has closed a $10 million Series B financing. The round was led by a strategic investor and included participation from prior investors including Voyager Capital and Founders Collective, as well as the Blue Box executive team. This round follows a $4.3 million Series A closed in December of 2012 and led by Voyager Capital. In May of this year, the company announced general availability of its private cloud as a service offering, Blue Box Cloud. Since that release, the company has dem...
SYS-CON Events announced today that Verizon has been named "Gold Sponsor" of SYS-CON's 15th International Cloud Expo®, which will take place on November 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Verizon Enterprise Solutions creates global connections that generate growth, drive business innovation and move society forward. With industry-specific solutions and a full range of global wholesale offerings provided over the company's secure mobility, cloud, strategic network...
SimpleECM is the only platform to offer a powerful combination of enterprise content management (ECM) services, capture solutions, and third-party business services providing simplified integrations and workflow development for solution providers. SimpleECM is opening the market to businesses of all sizes by reinventing the delivery of ECM services. Our APIs make the development of ECM services simple with the use of familiar technologies for a frictionless integration directly into web applicat...
The only place to be June 9-11 is Cloud Expo & @ThingsExpo 2015 East at the Javits Center in New York City. Join us there as delegates from all over the world come to listen to and engage with speakers & sponsors from the leading Cloud Computing, IoT & Big Data companies. Cloud Expo & @ThingsExpo are the leading events covering the booming market of Cloud Computing, IoT & Big Data for the enterprise. Speakers from all over the world will be hand-picked for their ability to explore the economic...
Cloudwick, the leading big data DevOps service and solution provider to the Fortune 1000, announced Big Loop, its multi-vendor operations platform. Cloudwick Big Loop creates greater collaboration between Fortune 1000 IT staff, developers and their database management systems as well as big data vendors. This allows customers to comprehensively manage and oversee their entire infrastructure, which leads to more successful production cluster operations, and scale-out. Cloudwick Big Loop supports ...
To manage complex web services with lots of calls to the cloud, many businesses have invested in Application Performance Management (APM) and Network Performance Management (NPM) tools. Together APM and NPM tools are essential aids in improving a business’s infrastructure required to support an effective web experience… but they are missing a critical component – Internet visibility. Internet connectivity has always played a role in customer access to web presence, but in the past few years use...