|By Hovhannes Avoyan||
|April 9, 2012 07:00 AM EDT||
It is our goal at Monitis to make the lives of web developers and system administrators easy. We have reviewed the 5 leading hosted hadoop-based applications and given a short analysis of them in this post to help guide you in finding a solution that best suits your needs.
The article covers: Amazon Elastic MapReduce; Cloud Era CDH; InfoSphere BigInsights; MapR M3 and M5 and Hortonworks Data Platform.
Amazon Elastic MapReduce (http://aws.amazon.com/elasticmapreduce/)
Introduced by Amazon in 2009, Elastic MapReduce automates the process of various Hadoop cluster processes and transfers between Amazon’s EC2 and S3 products. For a minimal fee, Amazon will provide its clients with the ability to launch a preconfigured Hadoop cluster to run a client’s MapReduce Program.
- Very easy to setup a job flow
- There’s an enormous amount of documentation available to help new users
- Example applications are provided, giving an option to test drive the application before putting it to use.
- Entire application system can be powered by a command line interface, compared to a web-based management console.
- Ability to conduct several jobs simultaneously and parallel.
- No hardware is needed and costs can be very limited, which is great for small businesses seeking to be more cost efficient.
- Need an account with Amazon Web Services (AWS)
- Service is only available in the United States
- Requires the use of Amazon’s S3 service, which adds extra costs to an overall project (data transfer, security etc.)
Cloudera CDH (www.cloudera.com)
Founded in March 2009, Cloudera was previously considered to be the Red Hat of the Hadoop World. With a large customer base of over 400 (including paid and free downloads), the company’s offerings include the Cloudera Enterprise products and Training & Support Services. Formed by a number of key executives from various technology giants (Oracle, Yahoo, Google and Facebook), Cloudera is considered the pioneer in the Hadoop community, having a head start in the industry compared to its competitors.
- Free application that can be easily downloaded
- Installed internally within an organization which allows the company to have full control of all processes, jobs etc.
- Technical support is superior and the knowledgebase is an essential resource to anyone starting out with Hadoop
- Used by a large number of companies worldwide, and has been proven as a leading choice in Hadoop applications.
- Application includes additional resources and components (e.g. Pig, Hive, Flume, HBase, Zookeeper, Mahout, Whirr, Hue, Sqoop and Oozie)
- Cloudera conducts quarterly updates: eliminating the need to conduct a big scale annual upgrade.
- Requires companies to obtain the necessary hardware in order to install the application, adding additional costs.
- Additional costs are added to support and maintain the application, increasing the company’s operating costs.
IBM InfoSphere BigInsights (www.ibm.com/software/data/infosphere/biginsights)
A new product introduced in May 2011, the product is geared towards handling extremely large volumes of streaming data using a Hadoop-based analytics framework. IBM states that the IBM InfoSphere Biginsights will be able to handle “tens-of-petabytes” of data, and will retain a sub-millisecond response time. The company also plans to launch 20 new service offerings, including numerous analytical tools for business and IT.
- Superior product support and long standing company reputation established from many years of servicing the IT community.
- Comes standard with a number of essential components including; PIG programming, IBM DB2 and IBM BigSheets.
- Offers two replication models that provide log-based replication working independently (queue-based and SQL-based).
- Lots of documentation and step-by-step training is available from the IBM website.
- Superior product for analysing big data in motion that needs to be continuously analyzed in real time.
- New to the marketplace and has not been around long enough to ensure a solid reputation.
- An expensive solution for small/medium size organizations seeking to utilize a more cost effective application.
MapR M3 and M5 (www.mapr.com)
With headquarters in San Jose, CA, MapR markets its proprietary applications with a focus on providing a number of key features and capabilities for the use with MapReduce and Hadoop.
- Offers superior monitoring that can provide a better understanding of data distribution and processing – essential for achieving increased performance.
- A free version is offered, which includes everything except management tools which are only offered in its M5 series products.
- Excellent technical support and vast quantities of documentation available
- New to the marketplace so has a limited reputation
- An expensive solution for small/medium size organizations
- 24×7 support is only available on the paid version of the application
- Requires an enormous amount of disk space to install (25GB), compared to similar products.
Hortonworks Data Platform (http://hortonworks.com/)
Hortonworks was formed in June 2011 by a number of key architects and Hadoop committers formerly employed within the Yahoo Hadoop Software department. The company’s offerings include; HDP (Hadoop Data Platform) and Training Support Services. The company currently serves 2 customers – Yahoo and Microsoft.
- A spin-off Yahoo product, so it’s been tested in the marketplace.
- Lots of documentation and support available from the knowledgebase community.
- The company is continuously working with Yahoo to develop its future products
- Scalable to meet the demands of specific projects.
- Offers variations and expanded product offerings from partnerships with a number of specialized companies.
- Product is similar in nature to Cloudera, and provides similar features.
1 YEAR WEBSITE TRAFFIC COMPARISON (from Compete.com)
Hopefully our post has been of interest to web developers and system administrators.
More information on Monitis can be found on our website: www.monitis.com
"eFolder does a lot of different things but we protect data and we are focused on protecting data no matter where it resides," explained Carlo Tapia, Product Marketing Manager at eFolder, in this SYS-CON.tv interview at Cloud Expo, held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA.
Dec. 1, 2015 04:00 PM EST
The Internet of Things (IoT) is growing rapidly by extending current technologies, products and networks. By 2020, Cisco estimates there will be 50 billion connected devices. Gartner has forecast revenues of over $300 billion, just to IoT suppliers. Now is the time to figure out how you’ll make money – not just create innovative products. With hundreds of new products and companies jumping into the IoT fray every month, there’s no shortage of innovation. Despite this, McKinsey/VisionMobile data...
Dec. 1, 2015 04:00 PM EST Reads: 499
Cloud computing is unquestionably one of the driving forces of DevOps, as the automation of operations transforms enterprise software development. DevOps, however, is more than a technology trend, as it represents a move toward silo-busting, self-organizing horizontal teams that drive business velocity. At the same time, enterprise Digital Transformation represents an upheaval across the enterprise, as customer preferences and behavior drive enterprise technology decisions. This transformation ...
Dec. 1, 2015 03:45 PM EST
Most of the IoT Gateway scenarios involve collecting data from machines/processing and pushing data upstream to cloud for further analytics. The gateway hardware varies from Raspberry Pi to Industrial PCs. The document states the process of allowing deploying polyglot data pipelining software with the clear notion of supporting immutability. In his session at @ThingsExpo, Shashank Jain, a development architect for SAP Labs, discussed the objective, which is to automate the IoT deployment proces...
Dec. 1, 2015 03:00 PM EST Reads: 145
Just over a week ago I received a long and loud sustained applause for a presentation I delivered at this year’s Cloud Expo in Santa Clara. I was extremely pleased with the turnout and had some very good conversations with many of the attendees. Over the next few days I had many more meaningful conversations and was not only happy with the results but also learned a few new things. Here is everything I learned in those three days distilled into three short points.
Dec. 1, 2015 03:00 PM EST Reads: 385
DevOps is about increasing efficiency, but nothing is more inefficient than building the same application twice. However, this is a routine occurrence with enterprise applications that need both a rich desktop web interface and strong mobile support. With recent technological advances from Isomorphic Software and others, rich desktop and tuned mobile experiences can now be created with a single codebase – without compromising functionality, performance or usability. In his session at DevOps Su...
Dec. 1, 2015 02:45 PM EST Reads: 447
In demand-intensive mobile and web applications, an emerging pattern is to host the Systems of Engagement in the cloud (for maximum responsiveness) but keep the Systems of Record with the other important business systems in the company datacenter, often on a tightly secured mainframe. But what about the space in between? In this IBM Redpaper publication, we show that the IBM Bluemix cloud platform offers technologies that make it easy for cloud-based SoEs to securely connect to on-premises IBM...
Dec. 1, 2015 02:45 PM EST
As organizations realize the scope of the Internet of Things, gaining key insights from Big Data, through the use of advanced analytics, becomes crucial. However, IoT also creates the need for petabyte scale storage of data from millions of devices. A new type of Storage is required which seamlessly integrates robust data analytics with massive scale. These storage systems will act as “smart systems” provide in-place analytics that speed discovery and enable businesses to quickly derive meaningf...
Dec. 1, 2015 02:15 PM EST Reads: 451
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
Dec. 1, 2015 02:00 PM EST Reads: 548
In his General Session at 17th Cloud Expo, Bruce Swann, Senior Product Marketing Manager for Adobe Campaign, explored the key ingredients of cross-channel marketing in a digital world. Learn how the Adobe Marketing Cloud can help marketers embrace opportunities for personalized, relevant and real-time customer engagement across offline (direct mail, point of sale, call center) and digital (email, website, SMS, mobile apps, social networks, connected objects).
Dec. 1, 2015 01:45 PM EST Reads: 358
The buzz continues for cloud, data analytics and the Internet of Things (IoT) and their collective impact across all industries. But a new conversation is emerging - how do companies use industry disruption and technology enablers to lead in markets undergoing change, uncertainty and ambiguity? Organizations of all sizes need to evolve and transform, often under massive pressure, as industry lines blur and merge and traditional business models are assaulted and turned upside down. In this new da...
Dec. 1, 2015 01:15 PM EST Reads: 307
SYS-CON Events announced today that Catchpoint, a global leader in monitoring, and testing the performance of online applications, has been named "Silver Sponsor" of DevOps Summit New York, which will take place on June 7-9, 2016 at the Javits Center in New York City. Catchpoint radically transforms the way businesses manage, monitor, and test the performance of online applications. Truly understand and improve user experience with clear visibility into complex, distributed online systems.Founde...
Dec. 1, 2015 12:15 PM EST
In recent years, at least 40% of companies using cloud applications have experienced data loss. One of the best prevention against cloud data loss is backing up your cloud data. In his General Session at 17th Cloud Expo, Sam McIntyre, Partner Enablement Specialist at eFolder, presented how organizations can use eFolder Cloudfinder to automate backups of cloud application data. He also demonstrated how easy it is to search and restore cloud application data using Cloudfinder.
Dec. 1, 2015 12:00 PM EST Reads: 232
With all the incredible momentum behind the Internet of Things (IoT) industry, it is easy to forget that not a single CEO wakes up and wonders if “my IoT is broken.” What they wonder is if they are making the right decisions to do all they can to increase revenue, decrease costs, and improve customer experience – effectively the same challenges they have always had in growing their business. The exciting thing about the IoT industry is now these decisions can be better, faster, and smarter. Now ...
Dec. 1, 2015 12:00 PM EST Reads: 310
The Internet of Everything is re-shaping technology trends–moving away from “request/response” architecture to an “always-on” Streaming Web where data is in constant motion and secure, reliable communication is an absolute necessity. As more and more THINGS go online, the challenges that developers will need to address will only increase exponentially. In his session at @ThingsExpo, Todd Greene, Founder & CEO of PubNub, exploreed the current state of IoT connectivity and review key trends and t...
Dec. 1, 2015 11:45 AM EST Reads: 476
Actifio is powering new application development and testing services from Net3 Technologies (N3T), a managed cloud services provider. N3T's new Symmetry DevOps™ service builds on its existing Palmetto Virtual Data Center (PvDC) Cloud services for data backup and disaster recovery (DR) based on the Actifio Copy Data Virtualization platform. Previously, N3T's data protection and DR services were challenged by overlapping and inefficient legacy hardware and software platforms from multiple vendo...
Dec. 1, 2015 11:30 AM EST
Countless business models have spawned from the IaaS industry – resell Web hosting, blogs, public cloud, and on and on. With the overwhelming amount of tools available to us, it's sometimes easy to overlook that many of them are just new skins of resources we've had for a long time. In his general session at 17th Cloud Expo, Harold Hannon, Sr. Software Architect at SoftLayer, an IBM Company, broke down what we have to work with, discussed the benefits and pitfalls and how we can best use them ...
Dec. 1, 2015 10:45 AM EST Reads: 136
Discussions of cloud computing have evolved in recent years from a focus on specific types of cloud, to a world of hybrid cloud, and to a world dominated by the APIs that make today's multi-cloud environments and hybrid clouds possible. In this Power Panel at 17th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the importance of customers being able to use the specific technologies they need, through environments and ecosystems that expose their APIs to make true ...
Dec. 1, 2015 10:00 AM EST Reads: 579
Microservices are a very exciting architectural approach that many organizations are looking to as a way to accelerate innovation. Microservices promise to allow teams to move away from monolithic "ball of mud" systems, but the reality is that, in the vast majority of organizations, different projects and technologies will continue to be developed at different speeds. How to handle the dependencies between these disparate systems with different iteration cycles? Consider the "canoncial problem"...
Dec. 1, 2015 09:00 AM EST Reads: 485
We all know that data growth is exploding and storage budgets are shrinking. Instead of showing you charts on about how much data there is, in his General Session at 17th Cloud Expo, Scott Cleland, Senior Director of Product Marketing at HGST, showed how to capture all of your data in one place. After you have your data under control, you can then analyze it in one place, saving time and resources.
Dec. 1, 2015 08:00 AM EST Reads: 255