Click here to close now.


@CloudExpo Authors: Harald Zeitlhofer, Pat Romanski, Elizabeth White, SmartBear Blog, Ian Khan

Related Topics: @CloudExpo, Open Source Cloud

@CloudExpo: Blog Feed Post

Top Five Hosted Hadoop-Based Applications Reviewed

Amazon Elastic MapReduce; Cloud Era CDH; InfoSphere BigInsights; MapR M3 and M5 and Hortonworks Data Platform

It is our goal at Monitis to make the lives of web developers and system administrators easy. We have reviewed the 5 leading hosted hadoop-based applications and given a short analysis of them in this post to help guide you in finding a solution that best suits your needs.

The article covers: Amazon Elastic MapReduce; Cloud Era CDH; InfoSphere BigInsights; MapR M3 and M5 and Hortonworks Data Platform.


Amazon Elastic MapReduce (

Introduced by Amazon in 2009, Elastic MapReduce automates the process of various Hadoop cluster processes and transfers between Amazon’s EC2 and S3 products. For a minimal fee, Amazon will provide its clients with the ability to launch a preconfigured Hadoop cluster to run a client’s MapReduce Program.

AWS Screenshot


  • Very easy to setup a job flow
  • There’s an enormous amount of documentation available to help new users
  • Example applications are provided, giving an option to test drive the application before putting it to use.
  • Entire application system can be powered by a command line interface, compared to a web-based management console.
  • Ability to conduct several jobs simultaneously and parallel.
  • No hardware is needed and costs can be very limited, which is great for small businesses seeking to be more cost efficient.


  • Need an account with Amazon Web Services (AWS)
  • Service is only available in the United States
  • Requires the use of Amazon’s S3 service, which adds extra costs to an overall project (data transfer, security etc.)


Cloudera Logo

Cloudera CDH (

Founded in March 2009, Cloudera was previously considered to be the Red Hat of the Hadoop World. With a large customer base of over 400 (including paid and free downloads), the company’s offerings include the  Cloudera Enterprise products and Training & Support Services. Formed by a number of key executives from various technology giants (Oracle, Yahoo, Google and Facebook), Cloudera is considered the pioneer in the Hadoop community, having a head start in the industry compared to its competitors.

Cloudera screen.jpg


  • Free application that can be easily downloaded
  • Installed internally within an organization which allows the company to have full control of all processes, jobs etc.
  • Technical support is superior and the knowledgebase is an essential resource to anyone starting out with Hadoop
  • Used by a large number of companies worldwide, and has been proven as a leading choice in Hadoop applications.
  • Application includes additional resources and components (e.g. Pig, Hive, Flume, HBase, Zookeeper, Mahout, Whirr, Hue, Sqoop and Oozie)
  • Cloudera conducts quarterly updates: eliminating the need to conduct a big scale annual upgrade.


  • Requires companies to obtain the necessary hardware in order to install the application, adding additional costs.
  • Additional costs are added to support and maintain the application, increasing the company’s operating costs.



IBM InfoSphere BigInsights (

A new product introduced in May 2011, the product is geared towards handling extremely large volumes of streaming data using a Hadoop-based analytics framework. IBM states that the IBM InfoSphere Biginsights will be able to handle “tens-of-petabytes” of data, and will retain a sub-millisecond response time. The company also plans to launch 20 new service offerings, including numerous analytical tools for business and IT.



  • Superior product support and long standing company reputation established from many years of servicing the IT community.
  • Comes standard with a number of essential components including; PIG programming, IBM DB2 and IBM BigSheets.
  • Offers two replication models that provide log-based replication working independently (queue-based and SQL-based).
  • Lots of documentation and step-by-step training is available from the IBM website.
  • Superior product for analysing big data in motion that needs to be continuously analyzed in real time.


  • New to the marketplace and has not been around long enough to ensure a solid reputation.
  • An expensive solution for small/medium size organizations seeking to utilize a more cost effective application.



MapR M3 and M5 (

With headquarters in San Jose, CA, MapR markets its proprietary applications with a focus on providing a number of key features and capabilities for the use with MapReduce and Hadoop.



  • Offers superior monitoring that can provide a better understanding of data distribution and processing – essential for achieving increased performance.
  • A free version is offered, which includes everything except management tools which are only offered in its M5 series products.
  • Excellent technical support and vast quantities of documentation available


  • New to the marketplace so has a limited reputation
  • An expensive solution for small/medium size organizations
  • 24×7 support is only available on the paid version of the application
  • Requires an enormous amount of disk space to install (25GB), compared to similar products.



Hortonworks Data Platform (

Hortonworks was formed in June 2011 by a number of key architects and Hadoop committers formerly employed within the Yahoo Hadoop Software department. The company’s offerings include; HDP (Hadoop Data Platform) and Training Support Services. The company currently serves 2 customers – Yahoo and Microsoft.


  • A spin-off Yahoo product, so it’s been tested in the marketplace.
  • Lots of documentation and support available from the knowledgebase community.
  • The company is continuously working with Yahoo to develop its future products
  • Scalable to meet the demands of specific projects.
  • Offers variations and expanded product offerings from partnerships with a number of specialized companies.


  • Product is similar in nature to Cloudera, and provides similar features.



Hadoop Based Application Website Performance

Hadoop Based Application Website Performance Stats

Hopefully our post has been of interest to web developers and system administrators.

More information on Monitis can be found on our website:

More Stories By Hovhannes Avoyan

Hovhannes Avoyan is the CEO of PicsArt, Inc.,

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

@CloudExpo Stories
"eFolder does a lot of different things but we protect data and we are focused on protecting data no matter where it resides," explained Carlo Tapia, Product Marketing Manager at eFolder, in this interview at Cloud Expo, held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA.
The Internet of Things (IoT) is growing rapidly by extending current technologies, products and networks. By 2020, Cisco estimates there will be 50 billion connected devices. Gartner has forecast revenues of over $300 billion, just to IoT suppliers. Now is the time to figure out how you’ll make money – not just create innovative products. With hundreds of new products and companies jumping into the IoT fray every month, there’s no shortage of innovation. Despite this, McKinsey/VisionMobile data...
Cloud computing is unquestionably one of the driving forces of DevOps, as the automation of operations transforms enterprise software development. DevOps, however, is more than a technology trend, as it represents a move toward silo-busting, self-organizing horizontal teams that drive business velocity. At the same time, enterprise Digital Transformation represents an upheaval across the enterprise, as customer preferences and behavior drive enterprise technology decisions. This transformation ...
Most of the IoT Gateway scenarios involve collecting data from machines/processing and pushing data upstream to cloud for further analytics. The gateway hardware varies from Raspberry Pi to Industrial PCs. The document states the process of allowing deploying polyglot data pipelining software with the clear notion of supporting immutability. In his session at @ThingsExpo, Shashank Jain, a development architect for SAP Labs, discussed the objective, which is to automate the IoT deployment proces...
Just over a week ago I received a long and loud sustained applause for a presentation I delivered at this year’s Cloud Expo in Santa Clara. I was extremely pleased with the turnout and had some very good conversations with many of the attendees. Over the next few days I had many more meaningful conversations and was not only happy with the results but also learned a few new things. Here is everything I learned in those three days distilled into three short points.
DevOps is about increasing efficiency, but nothing is more inefficient than building the same application twice. However, this is a routine occurrence with enterprise applications that need both a rich desktop web interface and strong mobile support. With recent technological advances from Isomorphic Software and others, rich desktop and tuned mobile experiences can now be created with a single codebase – without compromising functionality, performance or usability. In his session at DevOps Su...
In demand-intensive mobile and web applications, an emerging pattern is to host the Systems of Engagement in the cloud (for maximum responsiveness) but keep the Systems of Record with the other important business systems in the company datacenter, often on a tightly secured mainframe. But what about the space in between? In this IBM Redpaper publication, we show that the IBM Bluemix cloud platform offers technologies that make it easy for cloud-based SoEs to securely connect to on-premises IBM...
As organizations realize the scope of the Internet of Things, gaining key insights from Big Data, through the use of advanced analytics, becomes crucial. However, IoT also creates the need for petabyte scale storage of data from millions of devices. A new type of Storage is required which seamlessly integrates robust data analytics with massive scale. These storage systems will act as “smart systems” provide in-place analytics that speed discovery and enable businesses to quickly derive meaningf...
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
In his General Session at 17th Cloud Expo, Bruce Swann, Senior Product Marketing Manager for Adobe Campaign, explored the key ingredients of cross-channel marketing in a digital world. Learn how the Adobe Marketing Cloud can help marketers embrace opportunities for personalized, relevant and real-time customer engagement across offline (direct mail, point of sale, call center) and digital (email, website, SMS, mobile apps, social networks, connected objects).
The buzz continues for cloud, data analytics and the Internet of Things (IoT) and their collective impact across all industries. But a new conversation is emerging - how do companies use industry disruption and technology enablers to lead in markets undergoing change, uncertainty and ambiguity? Organizations of all sizes need to evolve and transform, often under massive pressure, as industry lines blur and merge and traditional business models are assaulted and turned upside down. In this new da...
SYS-CON Events announced today that Catchpoint, a global leader in monitoring, and testing the performance of online applications, has been named "Silver Sponsor" of DevOps Summit New York, which will take place on June 7-9, 2016 at the Javits Center in New York City. Catchpoint radically transforms the way businesses manage, monitor, and test the performance of online applications. Truly understand and improve user experience with clear visibility into complex, distributed online systems.Founde...
In recent years, at least 40% of companies using cloud applications have experienced data loss. One of the best prevention against cloud data loss is backing up your cloud data. In his General Session at 17th Cloud Expo, Sam McIntyre, Partner Enablement Specialist at eFolder, presented how organizations can use eFolder Cloudfinder to automate backups of cloud application data. He also demonstrated how easy it is to search and restore cloud application data using Cloudfinder.
With all the incredible momentum behind the Internet of Things (IoT) industry, it is easy to forget that not a single CEO wakes up and wonders if “my IoT is broken.” What they wonder is if they are making the right decisions to do all they can to increase revenue, decrease costs, and improve customer experience – effectively the same challenges they have always had in growing their business. The exciting thing about the IoT industry is now these decisions can be better, faster, and smarter. Now ...
The Internet of Everything is re-shaping technology trends–moving away from “request/response” architecture to an “always-on” Streaming Web where data is in constant motion and secure, reliable communication is an absolute necessity. As more and more THINGS go online, the challenges that developers will need to address will only increase exponentially. In his session at @ThingsExpo, Todd Greene, Founder & CEO of PubNub, exploreed the current state of IoT connectivity and review key trends and t...
Actifio is powering new application development and testing services from Net3 Technologies (N3T), a managed cloud services provider. N3T's new Symmetry DevOps™ service builds on its existing Palmetto Virtual Data Center (PvDC) Cloud services for data backup and disaster recovery (DR) based on the Actifio Copy Data Virtualization platform. Previously, N3T's data protection and DR services were challenged by overlapping and inefficient legacy hardware and software platforms from multiple vendo...
Countless business models have spawned from the IaaS industry – resell Web hosting, blogs, public cloud, and on and on. With the overwhelming amount of tools available to us, it's sometimes easy to overlook that many of them are just new skins of resources we've had for a long time. In his general session at 17th Cloud Expo, Harold Hannon, Sr. Software Architect at SoftLayer, an IBM Company, broke down what we have to work with, discussed the benefits and pitfalls and how we can best use them ...
Discussions of cloud computing have evolved in recent years from a focus on specific types of cloud, to a world of hybrid cloud, and to a world dominated by the APIs that make today's multi-cloud environments and hybrid clouds possible. In this Power Panel at 17th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the importance of customers being able to use the specific technologies they need, through environments and ecosystems that expose their APIs to make true ...
Microservices are a very exciting architectural approach that many organizations are looking to as a way to accelerate innovation. Microservices promise to allow teams to move away from monolithic "ball of mud" systems, but the reality is that, in the vast majority of organizations, different projects and technologies will continue to be developed at different speeds. How to handle the dependencies between these disparate systems with different iteration cycles? Consider the "canoncial problem"...
We all know that data growth is exploding and storage budgets are shrinking. Instead of showing you charts on about how much data there is, in his General Session at 17th Cloud Expo, Scott Cleland, Senior Director of Product Marketing at HGST, showed how to capture all of your data in one place. After you have your data under control, you can then analyze it in one place, saving time and resources.