Welcome!

@CloudExpo Authors: Elizabeth White, Liz McMillan, Pat Romanski, William Schmarzo, Jason Bloomberg

Related Topics: @CloudExpo, Java IoT, Microservices Expo, Containers Expo Blog

@CloudExpo: Article

Making Hadoop Safe for 'Clusterophobics'

Big things in store for Hadoop

Hadoop remains a difficult platform for most enterprises to master. For now skills are still hard to come by – both for data architect or engineer, and especially for data scientists. It still takes too much skill, tape, and baling wire to get a Hadoop cluster together. Not every enterprise is Google or Facebook, with armies of software engineers that they can throw at a problem. With some exceptions, most enterprises don’t deal with data on the scale of Google or Facebook either – but the bar is rising.

If 2011 was the year that the big IT data warehouse and analytic platform brand names discovered Hadoop, 2012 becomes the year where a tooling ecosystem starts emerging to make Hadoop more consumable for the enterprise. Let’s amend that – along with tools, Hadoop must also become a first-class citizen with enterprise IT infrastructure. Hadoop won’t cross over to the enterprise if it has to be treated as some special island. That means meshing with the practices and technology approaches that enterprises are using to manage their data centers or cloud deployments. Like SQL, data integration, virtualization, storage strategy, and so on.

Admittedly, much of this cuts against the grain of early Hadoop deployment that stressed open source and commodity infrastructure. Early adopters did so out of necessity as commercial software ran out of gas for Facebook when its data warehouse daily refreshes were breaking terabyte range, not to mention that the cost of commercial licenses for such scaled out analytic platforms wouldn’t have been trivial. Anyway, Hadoop’s linearity leverages scale out of commodity blades and direct attached disk as far as the eye can see, enabling such an almost pure noncommercial approach. At the time, Google’s, Yahoo’s, and Facebook’s issues were considered rather unique – most enterprise don’t run global search engines – not to mention that their business was built on armies of software engineers.

Something's got to give

As we’ve previously noted, something’s got to give on the skills front. Hadoop in the enterprise faces limits – the data problems are getting bigger and more complex for sure, but resources and skills are far more finite. So we envision tools and solutions addressing two areas:

  1. Products that address “clusterophobia” – organizations that seeks the scalable analytics of Hadoop but lack the appetite to erect infinite data centers out in the fields or hire the necessary skillsets. Obviously, using the cloud is one option – but the questions there revolve around whether corporate policies allow maintenance of data off premises, and also, as data store size grows, whether the cloud is still economical.
  2. The other side of the coin is consummability – tools that simplify access to and manipulation of the data.

In the run-up to this year’s Hadoop Summit, a number of tooling announcements addressing clusterophobia and consumption are pouring out.

The workloads are going to get more equitably distributed, and in the long run, we wouldn’t be surprised to see more Hadoop-only appliances.



On the fear of clusters side, players like Oracle, EMC Greenplum, and Teradata Aster are already offering appliances that simplify deployment of Hadoop, typically in conjunction with an Advanced SQL analytic platform. While most vendors position this as a way for Hadoop to “extend’ your data warehouse so you perform exploration in Hadoop, but the serious analytics in SQL, we view appliances as more than transitional strategy. The workloads are going to get more equitably distributed, and in the long run, we wouldn’t be surprised to see more Hadoop-only appliances, sort of like Oracle’s (for the record, they also bundle another NoSQL database).

Also addressing the same constituency are storage and virtualization – facts of life in the data center. For Hadoop to cross over to the enterprise, it, too, must get virtualization-friendly. Storage is an open question. The need for virtualization becomes even more apparent because (1) the exploratory nature of Hadoop analytics demands the ability to try out queries offline without having to disrupt or physically build a new cluster; and (2) the variable nature of Hadoop processing suggests that workloads are likely to be elastic. So we’ve been waiting for VMware to make their move. VMware – also part of EMC – has announced a pair of initiatives. First, they are working with the Apache Hadoop project to make the core pieces (HDFS and MapReduce) virtualization-aware, and separately, they are hosting their own open source project (Serengeti) for virtualizing Hadoop clusters. While Project Serengeti is not VM-specific, there’s little doubt that this will be a VMware project (we’d be shocked if the Xen folks were to buy in).

Storage follows


W
here there’s virtualized servers, storage often closely follows. A few months back, EMC dropped the other shoe, finally unveiling a strategy for leveraging Isilon with the Greenplum HD platform, the closest thing in NAS that replicates the scale-out model storage model popularized with Hadoop. This opens an argument of whether the scales of data in Hadoop make premium products such as Isilon unaffordable. The flip side however is the “open source tax,” where you hire the skills in your IT organization to manage and deploy scale-out storage, or pay consultants to do it for you.

In the spirit of making Hadoop more consummable, we expect a lot of vibes from new players that are simplifying navigation of Hadoop and building SQL bridges. Datameer is bringing down the pricing of its uber Hadoop spreadsheet to personal and workgroup levels courtesy of entry level pricing from $299 to $2999. Teradata Aster, which already offers a patented framework that translates SQL to MapReduce (there are also others out there) is now taking an early bet on the incubating Apache HCatalog metadata spec so that you could write SQL statements that go up against Hadoop. It joins approaches such as those from Hadapt, which hangs SQL tables from HDFS file nodes, and mainstream BI players such as Jaspersoft, that already provide translators that can grab reports directly from Hadoop.

In the spirit of making Hadoop more consummable, we expect a lot of vibes from new players that are simplifying navigation of Hadoop and building SQL bridges.



This doesn’t take away from the evolution of the Hadoop platform itself. Cloudera and Hortonworks are among those releasing new distributions that bundle their own mix of recent and current Apache Hadoop modules. While the Apache project has addressed the NameNode HA issue, it is still early in the game with bringing enterprise-grade manageability to MapReduce. That’s largely an academic issue as the bulk of enterprises have yet to implement Hadoop. By the time enterprises are ready, many of the core issues should resolve — although there will always be questions about the uptake of peripheral Hadoop projects.

What’s more important – and where the action will be – is in tools that allow enterprises to run and, more importantly, consume Hadoop. A chicken and egg situation, enterprises won’t implement before tools are available and vice versa.

You may also be interested in:

More Stories By Tony Baer

Tony Baer is Principal Analyst with Ovum, leading Ovum’s research on the software lifecycle. Working in concert with other members of Ovum’s software group, his research covers the full lifecycle from design and development to deployment and management. Areas of focus include application lifecycle management, software development methodologies (including agile), SOA, IT service management/ITIL, and IT management/governance.

Baer has been a noted authority on software development platforms and integration architecture for nearly 20 years. Prior to joining Ovum, he was an independent analyst whose company ‘onStrategies’ delivered software development and integration tools to vendors with technology assessment and market positioning services. He also led Computerwire’s CIO Agenda and Computer Finance end-user best practices research services.

Follow him on Twitter @TonyBaer or read his blog site www.onstrategies.com/blog.

@CloudExpo Stories
Today companies are looking to achieve cloud-first digital agility to reduce time-to-market, optimize utilization of resources, and rapidly deliver disruptive business solutions. However, leveraging the benefits of cloud deployments can be complicated for companies with extensive legacy computing environments. In his session at 21st Cloud Expo, Craig Sproule, founder and CEO of Metavine, will outline the challenges enterprises face in migrating legacy solutions to the cloud. He will also prese...
SYS-CON Events announced today that Ryobi Systems will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ryobi Systems Co., Ltd., as an information service company, specialized in business support for local governments and medical industry. We are challenging to achive the precision farming with AI. For more information, visit http:...
SYS-CON Events announced today that Enroute Lab will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enroute Lab is an industrial design, research and development company of unmanned robotic vehicle system. For more information, please visit http://elab.co.jp/.
Elon Musk is among the notable industry figures who worries about the power of AI to destroy rather than help society. Mark Zuckerberg, on the other hand, embraces all that is going on. AI is most powerful when deployed across the vast networks being built for Internets of Things in the manufacturing, transportation and logistics, retail, healthcare, government and other sectors. Is AI transforming IoT for the good or the bad? Do we need to worry about its potential destructive power? Or will we...
SYS-CON Events announced today that Nihon Micron will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Nihon Micron Co., Ltd. strives for technological innovation to establish high-density, high-precision processing technology for providing printed circuit board and metal mount RFID tags used for communication devices. For more inf...
SYS-CON Events announced today that mruby Forum will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. mruby is the lightweight implementation of the Ruby language. We introduce mruby and the mruby IoT framework that enhances development productivity. For more information, visit http://forum.mruby.org/.
In his session at @ThingsExpo, Greg Gorman is the Director, IoT Developer Ecosystem, Watson IoT, will provide a short tutorial on Node-RED, a Node.js-based programming tool for wiring together hardware devices, APIs and online services in new and interesting ways. It provides a browser-based editor that makes it easy to wire together flows using a wide range of nodes in the palette that can be deployed to its runtime in a single-click. There is a large library of contributed nodes that help so...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
DevOps at Cloud Expo – being held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real r...
Your clients expect transactions to never fail, cloud access to be fast and always on, and their data to be protected - no exceptions. Hear about how Secure Service Container (SSC), an IBM-exclusive open technology, enables secure building and hosting of next-generation applications, both cloud and on-premises. SSC protects the full stack from external and insider threats, allows automatic encryption of data in-flight and at-rest, and is tamper-resistant during installation and runtime – with no...
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, will discuss how data centers of the future will be managed, how th...
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of...
SYS-CON Events announced today that Mobile Create USA will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Mobile Create USA Inc. is an MVNO-based business model that uses portable communication devices and cellular-based infrastructure in the development, sales, operation and mobile communications systems incorporating GPS capabi...
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
SYS-CON Events announced today that Keisoku Research Consultant Co. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Keisoku Research Consultant, Co. offers research and consulting in a wide range of civil engineering-related fields from information construction to preservation of cultural properties. For more information, vi...
SYS-CON Events announced today that Interface Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Interface Corporation is a company developing, manufacturing and marketing high quality and wide variety of industrial computers and interface modules such as PCIs and PCI express. For more information, visit http://www.i...
SYS-CON Events announced today that Fusic will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Fusic Co. provides mocks as virtual IoT devices. You can customize mocks, and get any amount of data at any time in your test. For more information, visit https://fusic.co.jp/english/.
SYS-CON Events announced today that TMC has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo and Big Data at Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Global buyers rely on TMC’s content-driven marketplaces to make purchase decisions and navigate markets. Learn how we can help you reach your marketing goals.
Cloud-based disaster recovery is critical to any production environment and is a high priority for many enterprise organizations today. Nearly 40% of organizations have had to execute their BCDR plan due to a service disruption in the past two years. Zerto on IBM Cloud offer VMware and Microsoft customers simple, automated recovery of on-premise VMware and Microsoft workloads to IBM Cloud data centers.
Why Federal cloud? What is in Federal Clouds and integrations? This session will identify the process and the FedRAMP initiative. But is it sufficient? What is the remedy for keeping abreast of cutting-edge technology? In his session at 21st Cloud Expo, Rasananda Behera will examine the proposed solutions: Private or public or hybrid cloud Responsible governing bodies How can we accomplish?