Welcome!

@CloudExpo Authors: Ed Featherston, Rostyslav Demush, Jamie Madison, Jason Bloomberg, Greg Pierce

Related Topics: @CloudExpo, Java IoT, Microservices Expo, Containers Expo Blog

@CloudExpo: Article

Making Hadoop Safe for 'Clusterophobics'

Big things in store for Hadoop

Hadoop remains a difficult platform for most enterprises to master. For now skills are still hard to come by – both for data architect or engineer, and especially for data scientists. It still takes too much skill, tape, and baling wire to get a Hadoop cluster together. Not every enterprise is Google or Facebook, with armies of software engineers that they can throw at a problem. With some exceptions, most enterprises don’t deal with data on the scale of Google or Facebook either – but the bar is rising.

If 2011 was the year that the big IT data warehouse and analytic platform brand names discovered Hadoop, 2012 becomes the year where a tooling ecosystem starts emerging to make Hadoop more consumable for the enterprise. Let’s amend that – along with tools, Hadoop must also become a first-class citizen with enterprise IT infrastructure. Hadoop won’t cross over to the enterprise if it has to be treated as some special island. That means meshing with the practices and technology approaches that enterprises are using to manage their data centers or cloud deployments. Like SQL, data integration, virtualization, storage strategy, and so on.

Admittedly, much of this cuts against the grain of early Hadoop deployment that stressed open source and commodity infrastructure. Early adopters did so out of necessity as commercial software ran out of gas for Facebook when its data warehouse daily refreshes were breaking terabyte range, not to mention that the cost of commercial licenses for such scaled out analytic platforms wouldn’t have been trivial. Anyway, Hadoop’s linearity leverages scale out of commodity blades and direct attached disk as far as the eye can see, enabling such an almost pure noncommercial approach. At the time, Google’s, Yahoo’s, and Facebook’s issues were considered rather unique – most enterprise don’t run global search engines – not to mention that their business was built on armies of software engineers.

Something's got to give

As we’ve previously noted, something’s got to give on the skills front. Hadoop in the enterprise faces limits – the data problems are getting bigger and more complex for sure, but resources and skills are far more finite. So we envision tools and solutions addressing two areas:

  1. Products that address “clusterophobia” – organizations that seeks the scalable analytics of Hadoop but lack the appetite to erect infinite data centers out in the fields or hire the necessary skillsets. Obviously, using the cloud is one option – but the questions there revolve around whether corporate policies allow maintenance of data off premises, and also, as data store size grows, whether the cloud is still economical.
  2. The other side of the coin is consummability – tools that simplify access to and manipulation of the data.

In the run-up to this year’s Hadoop Summit, a number of tooling announcements addressing clusterophobia and consumption are pouring out.

The workloads are going to get more equitably distributed, and in the long run, we wouldn’t be surprised to see more Hadoop-only appliances.



On the fear of clusters side, players like Oracle, EMC Greenplum, and Teradata Aster are already offering appliances that simplify deployment of Hadoop, typically in conjunction with an Advanced SQL analytic platform. While most vendors position this as a way for Hadoop to “extend’ your data warehouse so you perform exploration in Hadoop, but the serious analytics in SQL, we view appliances as more than transitional strategy. The workloads are going to get more equitably distributed, and in the long run, we wouldn’t be surprised to see more Hadoop-only appliances, sort of like Oracle’s (for the record, they also bundle another NoSQL database).

Also addressing the same constituency are storage and virtualization – facts of life in the data center. For Hadoop to cross over to the enterprise, it, too, must get virtualization-friendly. Storage is an open question. The need for virtualization becomes even more apparent because (1) the exploratory nature of Hadoop analytics demands the ability to try out queries offline without having to disrupt or physically build a new cluster; and (2) the variable nature of Hadoop processing suggests that workloads are likely to be elastic. So we’ve been waiting for VMware to make their move. VMware – also part of EMC – has announced a pair of initiatives. First, they are working with the Apache Hadoop project to make the core pieces (HDFS and MapReduce) virtualization-aware, and separately, they are hosting their own open source project (Serengeti) for virtualizing Hadoop clusters. While Project Serengeti is not VM-specific, there’s little doubt that this will be a VMware project (we’d be shocked if the Xen folks were to buy in).

Storage follows


W
here there’s virtualized servers, storage often closely follows. A few months back, EMC dropped the other shoe, finally unveiling a strategy for leveraging Isilon with the Greenplum HD platform, the closest thing in NAS that replicates the scale-out model storage model popularized with Hadoop. This opens an argument of whether the scales of data in Hadoop make premium products such as Isilon unaffordable. The flip side however is the “open source tax,” where you hire the skills in your IT organization to manage and deploy scale-out storage, or pay consultants to do it for you.

In the spirit of making Hadoop more consummable, we expect a lot of vibes from new players that are simplifying navigation of Hadoop and building SQL bridges. Datameer is bringing down the pricing of its uber Hadoop spreadsheet to personal and workgroup levels courtesy of entry level pricing from $299 to $2999. Teradata Aster, which already offers a patented framework that translates SQL to MapReduce (there are also others out there) is now taking an early bet on the incubating Apache HCatalog metadata spec so that you could write SQL statements that go up against Hadoop. It joins approaches such as those from Hadapt, which hangs SQL tables from HDFS file nodes, and mainstream BI players such as Jaspersoft, that already provide translators that can grab reports directly from Hadoop.

In the spirit of making Hadoop more consummable, we expect a lot of vibes from new players that are simplifying navigation of Hadoop and building SQL bridges.



This doesn’t take away from the evolution of the Hadoop platform itself. Cloudera and Hortonworks are among those releasing new distributions that bundle their own mix of recent and current Apache Hadoop modules. While the Apache project has addressed the NameNode HA issue, it is still early in the game with bringing enterprise-grade manageability to MapReduce. That’s largely an academic issue as the bulk of enterprises have yet to implement Hadoop. By the time enterprises are ready, many of the core issues should resolve — although there will always be questions about the uptake of peripheral Hadoop projects.

What’s more important – and where the action will be – is in tools that allow enterprises to run and, more importantly, consume Hadoop. A chicken and egg situation, enterprises won’t implement before tools are available and vice versa.

You may also be interested in:

More Stories By Tony Baer

Tony Baer is Principal Analyst with Ovum, leading Ovum’s research on the software lifecycle. Working in concert with other members of Ovum’s software group, his research covers the full lifecycle from design and development to deployment and management. Areas of focus include application lifecycle management, software development methodologies (including agile), SOA, IT service management/ITIL, and IT management/governance.

Baer has been a noted authority on software development platforms and integration architecture for nearly 20 years. Prior to joining Ovum, he was an independent analyst whose company ‘onStrategies’ delivered software development and integration tools to vendors with technology assessment and market positioning services. He also led Computerwire’s CIO Agenda and Computer Finance end-user best practices research services.

Follow him on Twitter @TonyBaer or read his blog site www.onstrategies.com/blog.

@CloudExpo Stories
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, discussed how by using ne...
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
Enterprises are moving to the cloud faster than most of us in security expected. CIOs are going from 0 to 100 in cloud adoption and leaving security teams in the dust. Once cloud is part of an enterprise stack, it’s unclear who has responsibility for the protection of applications, services, and data. When cloud breaches occur, whether active compromise or a publicly accessible database, the blame must fall on both service providers and users. In his session at 21st Cloud Expo, Ben Johnson, C...
"Infoblox does DNS, DHCP and IP address management for not only enterprise networks but cloud networks as well. Customers are looking for a single platform that can extend not only in their private enterprise environment but private cloud, public cloud, tracking all the IP space and everything that is going on in that environment," explained Steve Salo, Principal Systems Engineer at Infoblox, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventio...
Data scientists must access high-performance computing resources across a wide-area network. To achieve cloud-based HPC visualization, researchers must transfer datasets and visualization results efficiently. HPC clusters now compute GPU-accelerated visualization in the cloud cluster. To efficiently display results remotely, a high-performance, low-latency protocol transfers the display from the cluster to a remote desktop. Further, tools to easily mount remote datasets and efficiently transfer...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"We're developing a software that is based on the cloud environment and we are providing those services to corporations and the general public," explained Seungmin Kim, CEO/CTO of SM Systems Inc., in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, provided a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to oper...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
SYS-CON Events announced today that Telecom Reseller has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
In his session at 21st Cloud Expo, James Henry, Co-CEO/CTO of Calgary Scientific Inc., introduced you to the challenges, solutions and benefits of training AI systems to solve visual problems with an emphasis on improving AIs with continuous training in the field. He explored applications in several industries and discussed technologies that allow the deployment of advanced visualization solutions to the cloud.
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, discussed how data centers of the future will be managed, how the p...