Click here to close now.


@CloudExpo Authors: Victoria Livschitz, Pat Romanski, Lori MacVittie, Elizabeth White, Esmeralda Swartz

Related Topics: @BigDataExpo, Microsoft Cloud, Containers Expo Blog, Agile Computing, @CloudExpo, Apache

@BigDataExpo: Blog Feed Post

Classifying Today’s “Big Data Innovators”

These 13 vendors distribute 16 unique data management products


Editor’s note: The piece below by   first appeared on the Hadapt blog and is republished with permission here. The framework presented provides insight into the very dynamic market around “Big Data Innovators” and should be of use for classifying many other firms in this interesting space. -bg

Recently InformationWeek published a piece, authored by Doug Henschen, that listed 13 innovative Big Data vendors. The complete list is reproduced below:

1.  MongoDB
2.  Amazon (Redshift, EMR, DynamoDB)
3.  Cloudera (CDH, Impala)
4.  Couchbase
5.  Datameer
6.  Datastax
7.  Hadapt
8.  Hortonworks
9.  Karmasphere
10.  MapR
11.  Neo Technology
12.  Platfora
13.  Splunk

Big-Data3These 13 vendors distribute 16 unique data management products (since both Amazon and Cloudera offer multiple distinct data management/processing systems), all of which push the boundary on Big Data management.

In this post I will attempt to subcategorize these 16 products into a competitive grouping, where products placed inside the same group can be considered replacements for each other (and hence are competitive), and each group is complementary to every other group.

Before starting this classification, I will remove three products that, while potentially being interesting from a Big Data perspective, are often used outside of what has become known as the “Big Data realm”, and therefore their primary competitors did not make it on the InformationWeek list. These three products are Splunk (which typically competes with companies focused on the security, compliance, and IT operations management verticals), Amazon Redshift (which typically completes with traditional MPP database vendors), and Neo Technology (which, although usually classified as a “NoSQL database”, its focus on graph data makes it highly unique from a technology and use case perspective relative to the other NoSQL databases on this list).

The remaining 13 products can be classified into four distinct groups:
1.  Operational data stores that allow flexible schemas
2.  Hadoop distributions
3.  Real-time Hadoop-based analytical platforms
4.  Hadoop-based BI solutions

Group 1 (operational data stores that allow flexible schemas)
This group is composed of database products that can be used to manage active data for dynamic applications with hard to define (or hard to predict) schemas. The database must be optimized for inserting, retrieving, updating, or deleting individual data items in real-time (latencies on the order of milliseconds), but should also support some sort of interface for performing analysis of the data stored within. The dynamic nature of the typical use case for databases in this group implies a NoSQL interface, and either a key-value or document-store retrieval model. From the InformationWeek list, MongoDB, DynamoDB, Couchbase, and Datastax all fit in this category. Although there are some significant technical differences between these products, they can nonetheless be roughly described as potential replacements for each other in Group 1 use cases.

Group 2 (Hadoop distributions)
The products in this group are designed for very different situations than Group 1. Hadoop is typically used for large scale data analysis and batch processing. Rather than inserting, retrieving, updating, or deleting individual data items, Hadoop is optimized for scanning through large swaths of data, processing and analyzing the data as it proceeds. Hadoop has become the poster-child for “Big Data” due to its proven massive scalability, and its ability to handle the “variety” aspect of Big Data (since Hadoop does not require data to fit neatly into rows and columns in order to be analyzed and processed). From the InformationWeek list, Cloudera, Hortonworks, MapR, and Amazon EMR all fit in this category.

Group 3 (real-time Hadoop-based analytical platforms)
Group 3 takes Hadoop to the next level, transforming it from a mere batch processing system to a full-fledged analytical platform that can answer queries in real-time. Furthermore, by adding a more robust SQL interface to Hadoop (in addition to industry-standard ODBC connectors), group 3 products help to hide the complexity of Hadoop and the need for Hadoop specialists, since traditional business intelligence and visualization tools are now able to interface directly with data stored inside Hadoop. From the InformationWeek list, Hadapt clearly fits in this category, and with certain caveats, so does Cloudera Impala (the caveats are that as of the time of writing this blog post (a) Impala is an extremely young codebase and is still only in beta (b) Impala only supports a small subset of SQL and does not support UDFs or other ways to combine structured and unstructured data in the same query, so calling it an “analytical platform” might be a bit of a stretch).

Group 4 (Hadoop-based BI solutions)
Often lumped together with group 3 products,  group 4 products are often confused as being competitive with group 3 products. However, just as business intelligence tools and analytical database solutions are highly complementary and were often packaged together in the pre-Hadoop world, the same is true in the Hadoop/Big Data world. Therefore, Datameer, Karmasphere, and Platfora, all of which function as a business intelligence layer above Hadoop, are capable of working closely with the group 3 products (with announcements along these lines already starting to begin).

In conclusion, although “Big Data” is an enormous and rapidly growing market, one single data management software product is not going to rule the market. Rather, there are four major groups of data management solutions within the Big Data space; and while there is fierce competition within each group, at the macro level these groups can not only co-exist, but are highly complementary. In the long run, it is likely that the 2-3 leaders in each group will emerge and share the Big Data pie.

Read the original blog entry...

More Stories By Bob Gourley

Bob Gourley, former CTO of the Defense Intelligence Agency (DIA), is Founder and CTO of Crucial Point LLC, a technology research and advisory firm providing fact based technology reviews in support of venture capital, private equity and emerging technology firms. He has extensive industry experience in intelligence and security and was awarded an intelligence community meritorious achievement award by AFCEA in 2008, and has also been recognized as an Infoworld Top 25 CTO and as one of the most fascinating communicators in Government IT by GovFresh.

@CloudExpo Stories
SYS-CON Events announced today that Agema Systems will exhibit at the 17th International Cloud Expo®, which will take place on November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Agema Systems is the leading provider of critical white-box rack solutions to data centers through the major integrators and value added distribution channels.
Interested in leveraging automation technologies and a cloud architecture to make developers more productive? Learn how PaaS can benefit your organization to help you streamline your application development, allow you to use existing infrastructure and improve operational efficiencies. Begin charting your path to PaaS with OpenShift Enterprise.
According to Forrester, public cloud platforms are evolving, blurring the lines between Software as a Service (SaaS), Infrastructure as a Service (IaaS), and Platform as a Service (PaaS) in order to satisfy the needs of enterprises and widen their appeal to developers. In The Forrester Wave™: Enterprise Public Cloud Platforms, Q4 2014, Forrester evaluates the 16 most significant Enterprise Public Cloud Platforms and details how each vendor fulfills the 19 evaluation criteria points.
SYS-CON Events announced today that MobiDev, a software development company, will exhibit at the 17th International Cloud Expo®, which will take place November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software development company with representative offices in Atlanta (US), Sheffield (UK) and Würzburg (Germany); and development centers in Ukraine. Since 2009 it has grown from a small group of passionate engineers and business managers to a full-scale mobi...
The cloud has reached mainstream IT. Those 18.7 million data centers out there (server closets to corporate data centers to colocation deployments) are moving to the cloud. In his session at 17th Cloud Expo, Achim Weiss, CEO & co-founder of ProfitBricks, will share how two companies – one in the U.S. and one in Germany – are achieving their goals with cloud infrastructure. More than a case study, he will share the details of how they prioritized their cloud computing infrastructure deployments ...
Data loss happens, even in the cloud. In fact, if your company has adopted a cloud application in the past three years, data loss has probably happened, whether you know it or not. In his session at 17th Cloud Expo, Bryan Forrester, Senior Vice President of Sales at eFolder, will present how common and costly cloud application data loss is and what measures you can take to protect your organization from data loss.
Organizations already struggle with the simple collection of data resulting from the proliferation of IoT, lacking the right infrastructure to manage it. They can't only rely on the cloud to collect and utilize this data because many applications still require dedicated infrastructure for security, redundancy, performance, etc. In his session at 17th Cloud Expo, Emil Sayegh, CEO of Codero Hosting, will discuss how in order to resolve the inherent issues, companies need to combine dedicated a...
The modern software development landscape consists of best practices and tools that allow teams to deliver software in a near-continuous manner. By adopting a culture of automation, measurement and sharing, the time to ship code has been greatly reduced, allowing for shorter release cycles and quicker feedback from customers and users. Still, with all of these tools and methods, how can teams stay on top of what is taking place across their infrastructure and codebase? Hopping between services a...
For almost two decades, businesses have discovered great opportunities to engage with customers and even expand revenue through digital systems, including web and mobile applications. Yet, even now, the conversation between the business and the technologists that deliver these systems is strained, in large part due to misaligned objectives. In his session at DevOps Summit, James Urquhart, Senior Vice President of Performance Analytics at SOASTA, Inc., will discuss how measuring user outcomes –...
Clearly the way forward is to move to cloud be it bare metal, VMs or containers. One aspect of the current public clouds that is slowing this cloud migration is cloud lock-in. Every cloud vendor is trying to make it very difficult to move out once a customer has chosen their cloud. In his session at 17th Cloud Expo, Naveen Nimmu, CEO of Clouber, Inc., will advocate that making the inter-cloud migration as simple as changing airlines would help the entire industry to quickly adopt the cloud wit...
SYS-CON Events announced today that VividCortex, the monitoring solution for the modern data system, will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. The database is the heart of most applications, but it’s also the part that’s hardest to scale, monitor, and optimize even as it’s growing 50% year over year. VividCortex is the first unified suite of database monitoring tools specifically desi...
Apps and devices shouldn't stop working when there's limited or no network connectivity. Learn how to bring data stored in a cloud database to the edge of the network (and back again) whenever an Internet connection is available. In his session at 17th Cloud Expo, Bradley Holt, Developer Advocate at IBM Cloud Data Services, will demonstrate techniques for replicating cloud databases with devices in order to build offline-first mobile or Internet of Things (IoT) apps that can provide a better, ...
“All our customers are looking at the cloud ecosystem as an important part of their overall product strategy. Some see it evolve as a multi-cloud / hybrid cloud strategy, while others are embracing all forms of cloud offerings like PaaS, IaaS and SaaS in their solutions,” noted Suhas Joshi, Vice President – Technology, at Harbinger Group, in this exclusive Q&A with Cloud Expo Conference Chair Roger Strukhoff.
SYS-CON Events announced today that Cloud Raxak has been named “Media & Session Sponsor” of SYS-CON's 17th Cloud Expo, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Raxak Protect automates security compliance across private and public clouds. Using the SaaS tool or managed service, developers can deploy cloud apps quickly, cost-effectively, and without error.
SYS-CON Events announced today that ProfitBricks, the provider of painless cloud infrastructure, will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. ProfitBricks is the IaaS provider that offers a painless cloud experience for all IT users, with no learning curve. ProfitBricks boasts flexible cloud servers and networking, an integrated Data Center Designer tool for visual control over the...
SYS-CON Events announced today that Key Information Systems, Inc. (KeyInfo), a leading cloud and infrastructure provider offering integrated solutions to enterprises, will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Key Information Systems is a leading regional systems integrator with world-class compute, storage and networking solutions and professional services for the most advanced softwa...
SYS-CON Events announced today that IBM Cloud Data Services has been named “Bronze Sponsor” of SYS-CON's 17th Cloud Expo, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. IBM Cloud Data Services offers a portfolio of integrated, best-of-breed cloud data services for developers focused on mobile computing and analytics use cases.
Learn how IoT, cloud, social networks and last but not least, humans, can be integrated into a seamless integration of cooperative organisms both cybernetic and biological. This has been enabled by recent advances in IoT device capabilities, messaging frameworks, presence and collaboration services, where devices can share information and make independent and human assisted decisions based upon social status from other entities. In his session at @ThingsExpo, Michael Heydt, founder of Seamless...
In recent years, at least 40% of companies using cloud applications have experienced data loss. One of the best prevention against cloud data loss is backing up your cloud data. In his General Session at 17th Cloud Expo, Bryan Forrester, Senior Vice President of Sales at eFolder, will present how organizations can use eFolder Cloudfinder to automate backups of cloud application data. He will also demonstrate how easy it is to search and restore cloud application data using Cloudfinder.
SYS-CON Events announced today that JFrog, maker of Artifactory, the popular Binary Repository Manager, will exhibit at SYS-CON's @DevOpsSummit Silicon Valley, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Based in California, Israel and France, founded by longtime field-experts, JFrog, creator of Artifactory and Bintray, has provided the market with the first Binary Repository solution and a software distribution social platform.