Welcome!

@CloudExpo Authors: Liz McMillan, Pat Romanski, Elizabeth White, Automic Blog, Kevin Jackson

Related Topics: @CloudExpo, Microservices Expo, Containers Expo Blog

@CloudExpo: Blog Post

Big Moves in Big Data: EMC's Hadoop Strategy

To date, Big Storage has been locked out of Big Data

To date, Big Storage has been locked out of Big Data. It’s been all about direct attached storage for several reasons. First, Advanced SQL players have typically optimized architectures from data structure (using columnar), unique compression algorithms, and liberal usage of caching to juice response over hundreds of terabytes. For the NoSQL side, it’s been about cheap, cheap, cheap along the Internet data center model: have lots of commodity stuff and scale it out. Hadoop was engineered exactly for such an architecture; rather than speed, it was optimized for sheer linear scale.

Over the past year, most of the major platform players have planted their table stakes with Hadoop. Not surprisingly, IT household names are seeking to somehow tame Hadoop and make it safe for the enterprise.

Up ' til now, anybody with armies of the best software engineers that Internet firms could buy could brute force their way to scale out humungous clusters and if necessary, invent their own technology, then share and harvest from the open source community at will. Hardly a suitable scenario for the enterprise mainstream, the common thread behind the diverse strategies of IBM, EMC, Microsoft, and Oracle toward Hadoop has been to not surprisingly make Hadoop more approachable.

Up ' til now, anybody with armies of the best software engineers that Internet firms could buy could brute force their way to scale out humungous clusters and if necessary.

What’s been conspicuously absent so far was a play from Big Optimized Storage. The conventional wisdom is that SAN or NAS are premium, architected systems whose costs might be prohibitive when you talk petabytes of data.

Similarly, so far there has been a different operating philosophy behind the first generation implementations from the NoSQL world that assumed that parts would fail, and that five nines service levels were overkill. And anyway, the design of Hadoop brute forced the solution: replicate to have three unique copies of the data distributed around the cluster, as hardware is cheap.

As Big Data gains traction in the enterprise, some of it will certainly fit this pattern of something being better than nothing, as the result is unique insights that would not otherwise be possible. For instance, if your running analysis of Facebook or Twitter goes down, it probably won’t take the business with it. But as enterprises adopt Hadoop – and as pioneers stretch Hadoop to new operational use cases such as what Facebook is doing with its messaging system – those concepts of mission-criticality are being revisited.

And so, ever since EMC announced last spring that its Greenplum unit would start supporting and bundling different versions of Hadoop, we’ve been waiting for the other shoe to drop: When would EMC infuse its Big Data play with its core DNA, storage?

Today, EMC announced that its Isilon networked storage system was adding native support for Apache Hadoop’s HDFS file system. There were some interesting nuances to the rollout.

Big vendors feeling their way

It’s interesting to see how IT household names are cautiously navigating their way into unfamiliar territory. EMC becomes the latest, after Oracle and Microsoft, to calibrate their Hadoop strategy in public.

Oracle announced its Big Data appliance last fall before it lined up its Hadoop distribution. Microsoft ditched its Dryad project built around its HPC Server. Now EMC has recalibrated its Hadoop strategy; when it first unveiled its Hadoop strategy last spring, the spotlight was on the MapR proprietary alternatives to the HDFS file system of Apache Hadoop. It’s interesting that vendor initial announcements have either been vague, or have been tweaked as they’ve waded into the market. For EMC’s shift, more about that below.


For EMC, HDFS is the mainstream

MapR’s strategy (and IBM’s along with it, regarding GPFS) has prompted debate and concern in the Hadoop community about commercial vendors forking the technology. As we’ve ranted previously, Hadoop’s growth will be tied, not only to megaplatform vendors that support it, but the third party tools and solutions ecosystem that grows around it.

For such a thing to happen, ISVs and consulting firms need to have a common target to write against, and having forked versions of Hadoop won’t exactly grow large partner communities.

Regarding EMC, the original strategy was two Greenplum Hadoop editions: a Community Edition with a free Apache distro and an Enterprise Edition that bundled MapR, both under the Greenplum HD branding umbrella. At first blush, it looked like EMC was going to earn the bulk of its money from the proprietary side of the Hadoop business.

This reflects emerging conventional wisdom that the enterprise mainstream is leery about lock-in to anything that smells proprietary for technology where they still are in the learning curve.

What’s significant is that the new announcement of Isilon support pertains on to the HDFS open source side. More to the point, EMC is rebranding and subtly repositioning its Greenplum Hadoop offerings: Greenplum HD is the Apache HDFS edition with the optional Isilon support, and Greenplum MR is the MapR version, which is niche targeted towards advanced Hadoop use cases that demand higher performance.

Coming atop recent announcements from Oracle and Microsoft that have come clearly out on the side of OEM’ing Apache rather than anything limited or proprietary, and this amounts to an unqualified endorsement of Apache Hadoop/HDFS as not only the formal, but also the de facto standard.

This reflects emerging conventional wisdom that the enterprise mainstream is leery about lock-in to anything that smells proprietary for technology where they still are in the learning curve. Other forks may emerge, but they will not be at the base file system layer. This leaves IBM and MapR pigeonholed – admittedly, there will be API compatibility, but clearly both are swimming upstream.

Central Storage is newest battleground

As noted earlier, Hadoop’s heritage has been the classic Internet data center scale-out model. The advantage is that, leveraging Hadoop’s highly linear scalability, organizations could easily expand their clusters quite easily by plucking more commodity server and disk. Pioneers or purists would scoff at the notion of an appliance approach because it was always simply scaling out inexpensive, commodity hardware, rather than paying premiums for big vendor boxes.

In blunt terms, the choice is whether you pay now or pay later. As mentioned before, do-it-yourself compute clusters require sweat equity – you need engineers who know how to design, deploy, and operate them. The flipside is that many, arguably most corporate IT organizations either lack the skills or the capital. There are various solutions to what might otherwise appear a Hobson’s Choice:

  • Go to a cloud service provider that has already created the infrastructure, such as what Microsoft is offering with its Hadoop-on-Azure services;
  • Look for a happy, simpler medium such as Amazon’s Elastic MapReduce on its DynamoDB service;
  • Subscribe to SaaS providers that offer Hadoop applications (e.g., social network analysis, smart grid as a service) as a service;

    Pioneers or purists would scoff at the notion of an appliance approach because it was always simply scaling out inexpensive, commodity hardware, rather than paying premiums for big vendor boxes.

  • Get a platform and have a systems integrator put it together for you (key to IBM’s BigInsights offering, and applicable to any SI that has a Hadoop practice)
  • Go to an appliance or engineered systems approach that puts Hadoop and/or its subsystems in a box, such as with Oracle Big Data Appliance or EMC’s Greenplum DCA. The systems engineering is mostly done for you, but the increments for growing the system can be much larger than simply adding a few x86 servers here or there (Greenplum HD DCA can scale in groups of 4 server modules). Entry or expansion costs are not necessarily cheap, but then again, you have to balance capital cost against labor.
  • Surrounding Hadoop infrastructure with solutions. This is not a mutually exclusive strategy; unless you’re Cloudera or Hortonworks, which make their business bundling and supporting the core Apache Hadoop platform, most of the household names will bundle frameworks, algorithms, and eventually solutions that in effect place Hadoop under the hood. For EMC, the strategy is their recent announcement of a Unified Analytics Platform (UAP) that provides collaborative development capabilities for Big Data applications. EMC is (or will be) hardly alone here.

With EMC’s new offering, the scale-up option tackles the next variable: storage. This is the natural progression of a market that will address many constituencies, and where there will be no single silver bullet that applies to all.

This guest post comes courtesy of Tony Baer’s OnStrategies blog. Tony is a senior analyst at Ovum.

More Stories By Tony Baer

Tony Baer is Principal Analyst with Ovum, leading Ovum’s research on the software lifecycle. Working in concert with other members of Ovum’s software group, his research covers the full lifecycle from design and development to deployment and management. Areas of focus include application lifecycle management, software development methodologies (including agile), SOA, IT service management/ITIL, and IT management/governance.

Baer has been a noted authority on software development platforms and integration architecture for nearly 20 years. Prior to joining Ovum, he was an independent analyst whose company ‘onStrategies’ delivered software development and integration tools to vendors with technology assessment and market positioning services. He also led Computerwire’s CIO Agenda and Computer Finance end-user best practices research services.

Follow him on Twitter @TonyBaer or read his blog site www.onstrategies.com/blog.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone inn...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
DevOps promotes continuous improvement through a culture of collaboration. But in real terms, how do you: Integrate activities across diverse teams and services? Make objective decisions with system-wide visibility? Use feedback loops to enable learning and improvement? With technology insights and real-world examples, in his general session at @DevOpsSummit, at 21st Cloud Expo, Andi Mann, Chief Technology Advocate at Splunk, explored how leading organizations use data-driven DevOps to close th...
"Digital transformation - what we knew about it in the past has been redefined. Automation is going to play such a huge role in that because the culture, the technology, and the business operations are being shifted now," stated Brian Boeggeman, VP of Alliances & Partnerships at Ayehu, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
The past few years have brought a sea change in the way applications are architected, developed, and consumed—increasing both the complexity of testing and the business impact of software failures. How can software testing professionals keep pace with modern application delivery, given the trends that impact both architectures (cloud, microservices, and APIs) and processes (DevOps, agile, and continuous delivery)? This is where continuous testing comes in. D
"WineSOFT is a software company making proxy server software, which is widely used in the telecommunication industry or the content delivery networks or e-commerce," explained Jonathan Ahn, COO of WineSOFT, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Smart cities have the potential to change our lives at so many levels for citizens: less pollution, reduced parking obstacles, better health, education and more energy savings. Real-time data streaming and the Internet of Things (IoT) possess the power to turn this vision into a reality. However, most organizations today are building their data infrastructure to focus solely on addressing immediate business needs vs. a platform capable of quickly adapting emerging technologies to address future ...
Mobile device usage has increased exponentially during the past several years, as consumers rely on handhelds for everything from news and weather to banking and purchases. What can we expect in the next few years? The way in which we interact with our devices will fundamentally change, as businesses leverage Artificial Intelligence. We already see this taking shape as businesses leverage AI for cost savings and customer responsiveness. This trend will continue, as AI is used for more sophistica...
There is a huge demand for responsive, real-time mobile and web experiences, but current architectural patterns do not easily accommodate applications that respond to events in real time. Common solutions using message queues or HTTP long-polling quickly lead to resiliency, scalability and development velocity challenges. In his session at 21st Cloud Expo, Ryland Degnan, a Senior Software Engineer on the Netflix Edge Platform team, will discuss how by leveraging a reactive stream-based protocol,...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
Sanjeev Sharma Joins June 5-7, 2018 @DevOpsSummit at @Cloud Expo New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve f...
Digital Transformation (DX) is not a "one-size-fits all" strategy. Each organization needs to develop its own unique, long-term DX plan. It must do so by realizing that we now live in a data-driven age, and that technologies such as Cloud Computing, Big Data, the IoT, Cognitive Computing, and Blockchain are only tools. In her general session at 21st Cloud Expo, Rebecca Wanta explained how the strategy must focus on DX and include a commitment from top management to create great IT jobs, monitor ...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive ov...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
You know you need the cloud, but you're hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You're looking at private cloud solutions based on hyperconverged infrastructure, but you're concerned with the limits inherent in those technologies. What do you do?