@CloudExpo Authors: Elizabeth White, Pat Romanski, Dana Gardner, Scott Allen, Carmen Gonzalez

Related Topics: @CloudExpo, @BigDataExpo

@CloudExpo: Article

Hadoop Moving More Toward Real-Time

Interview with Continuent CEO Robert Hodges

No discussion of the Red Hat Summit 2014 would be complete without some discussion of Apache Hadoop. The happy elephant has now been pushing data for close to a decade, its distributed file system (HDFS) setting the tone for support of modern-day, highly distributed and very large databases in the cloud.

So I was pleased to have Robert Hodges, CEO of Hadoop-focused Continuent Tungsten, answer a few questions about his company's world.

Roger: What's the scope of the challenge you face in addressing big Hadoop deployments?

Robert: Hadoop is really very powerful as the way to concentrate and analyze information, so the key issue is how the information from existing transactional data stores gets added to Hadoop without implying additional load, application changes, or repetitive dump processes.

From our existing customer deployments, we know that the biggest challenge is getting the information into Hadoop as quickly and timely as possible from multiple different hosts simultaneously. Our customers often have many more transactional hosts running MySQL than they have Hadoop hosts, just because the scale-out and sharding required to support their transactional needs is so high.

Roger: What are the key pain points?

Robert: The key pain points are therefore the extraction of data from the transactional stores without implying additional load on these servers which are running their live customer facing website, while simultaneously loading large quantities of data that needs to be merged and analysed on the Hadoop side.

The replication solution based on Tungsten Replicator provides this very simply by placing a very low-level of load required for extraction of data, while continually streaming the changes over into Hadoop. Because this can be done on a server or cluster basis, it is easy to scale up the replication of data into Hadoop by adding more streams of replication data.

Roger: How critical is the real-time aspect of modern IT? How quickly is it growing?

Robert: It's growing very quickly, and in some cases quicker than some company IT departments and the technology they support are able to cope. Replication has for a long time been the solution for this scale-out process, but the flows of this replication data are changing.

One of the key drivers behind the adoption of Hadoop and Cassandra and similar databases is the ability to parallel process the data to get numbers in real-time. You can see this in a wide range of different markets, from banking, through to social networking and online stores.

As we get access to more information, the services supporting them need to support that an ever faster rate. We all want the lowest rate on my plane ticket purchase, while receiving the absolute best benefits and service, and all those different elements rely on real-time analysis.

Roger: What does IT think of this?

Robert: Of course, this also presents a completely different problem for the IT departments. They must deal with how to get the data into a system so that it can be analyzed quickly. The location for your active transactional dataset is not the same as your analysis tools, and may be based on completely different quantities of raw data.

Transactional databases might be conveniently sharded into 50 or 100 different RDBMS of 100GB each, but analysis needs to process all 10,000GB of data collectively to get meaningful information. That means that the IT infrastructure needs an effective way to combine and transfer this active data.

It's also clear from recent advancements in querying and processing techniques built on top of Hadoop that Hadoop itself is moving into a more real-time tool. Spark, Storm and other query engines provide very fast query and analysis on very large datasets, taking advantage of the distributed nature of Hadoop, and the increasing RAM and CPU power in evolutions of new hardware. Compatibility with Spark and similar live query mechanisms in Hadoop will form a key part of the next evolution of all Hadoop deployments.

Roger: How key is the role of Big Data in developing your solutions? How important is the term Big Data to you?

Robert: Big Data has been a significant requirement for our customers and their needs for some time, but we have definitely seen a shift recently from the scale-out, sharded nature of the typical RDBMS towards concentrating that information for analysis in Big Data stores. As that movement of data moves into the real-time it will be critical to the tools we develop to help make the transfer and management of data replication as easy as possible for our customers.

To us as the provider of the tools that enable our customers to easily share and transfer data, Big Data is therefore as important to us as it is to our customers. Of course, transactional databases are not going away, and we certainly don't expect that to change, but Hadoop and other Big Data solutions are being brought to work alongside these active data stores. Continuent will certainly be looking to expand our different solutions and techniques to bridge the gap between RDBMS and Big Data.

Contact Me on Twitter

More Stories By Roger Strukhoff

Roger Strukhoff (@IoT2040) is Executive Director of the Tau Institute for Global ICT Research, with offices in Illinois and Manila. He is Conference Chair of @CloudExpo & @ThingsExpo, and Editor of SYS-CON Media's CloudComputing BigData & IoT Journals. He holds a BA from Knox College & conducted MBA studies at CSU-East Bay.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

@CloudExpo Stories
So you think you are a DevOps warrior, huh? Put your money (not really, it’s free) where your metrics are and prove it by taking The Ultimate DevOps Geek Quiz Challenge, sponsored by DevOps Summit. Battle through the set of tough questions created by industry thought leaders to earn your bragging rights and win some cool prizes.
Fact is, enterprises have significant legacy voice infrastructure that’s costly to replace with pure IP solutions. How can we bring this analog infrastructure into our shiny new cloud applications? There are proven methods to bind both legacy voice applications and traditional PSTN audio into cloud-based applications and services at a carrier scale. Some of the most successful implementations leverage WebRTC, WebSockets, SIP and other open source technologies. In his session at @ThingsExpo, Da...
In past @ThingsExpo presentations, Joseph di Paolantonio has explored how various Internet of Things (IoT) and data management and analytics (DMA) solution spaces will come together as sensor analytics ecosystems. This year, in his session at @ThingsExpo, Joseph di Paolantonio from DataArchon, will be adding the numerous Transportation areas, from autonomous vehicles to “Uber for containers.” While IoT data in any one area of Transportation will have a huge impact in that area, combining sensor...
@ThingsExpo has been named the Top 5 Most Influential M2M Brand by Onalytica in the ‘Machine to Machine: Top 100 Influencers and Brands.' Onalytica analyzed the online debate on M2M by looking at over 85,000 tweets to provide the most influential individuals and brands that drive the discussion. According to Onalytica the "analysis showed a very engaged community with a lot of interactive tweets. The M2M discussion seems to be more fragmented and driven by some of the major brands present in the...
If you had a chance to enter on the ground level of the largest e-commerce market in the world – would you? China is the world’s most populated country with the second largest economy and the world’s fastest growing market. It is estimated that by 2018 the Chinese market will be reaching over $30 billion in gaming revenue alone. Admittedly for a foreign company, doing business in China can be challenging. Often changing laws, administrative regulations and the often inscrutable Chinese Interne...
SYS-CON Events announced today that SoftNet Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. SoftNet Solutions specializes in Enterprise Solutions for Hadoop and Big Data. It offers customers the most open, robust, and value-conscious portfolio of solutions, services, and tools for the shortest route to success with Big Data. The unique differentiator is the ability to architect and ...
SYS-CON Events announced today that Pulzze Systems will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Pulzze Systems, Inc. provides infrastructure products for the Internet of Things to enable any connected device and system to carry out matched operations without programming. For more information, visit http://www.pulzzesystems.com.
In the next forty months – just over three years – businesses will undergo extraordinary changes. The exponential growth of digitization and machine learning will see a step function change in how businesses create value, satisfy customers, and outperform their competition. In the next forty months companies will take the actions that will see them get to the next level of the game called Capitalism. Or they won’t – game over. The winners of today and tomorrow think differently, follow different...
One of biggest questions about Big Data is “How do we harness all that information for business use quickly and effectively?” Geographic Information Systems (GIS) or spatial technology is about more than making maps, but adding critical context and meaning to data of all types, coming from all different channels – even sensors. In his session at @ThingsExpo, William (Bill) Meehan, director of utility solutions for Esri, will take a closer look at the current state of spatial technology and ar...
SYS-CON Events announced today that Interface Masters Technologies, a leader in Network Visibility and Uptime Solutions, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Interface Masters Technologies is a leading vendor in the network monitoring and high speed networking markets. Based in the heart of Silicon Valley, Interface Masters' expertise lies in Gigabit, 10 Gigabit and 40 Gigabit Eth...
As software becomes more and more complex, we, as software developers, have been splitting up our code into smaller and smaller components. This is also true for the environment in which we run our code: going from bare metal, to VMs to the modern-day Cloud Native world of containers, schedulers and microservices. While we have figured out how to run containerized applications in the cloud using schedulers, we've yet to come up with a good solution to bridge the gap between getting your conta...
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here
SYS-CON Events announced today that Streamlyzer will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Streamlyzer is a powerful analytics for video streaming service that enables video streaming providers to monitor and analyze QoE (Quality-of-Experience) from end-user devices in real time.
You have great SaaS business app ideas. You want to turn your idea quickly into a functional and engaging proof of concept. You need to be able to modify it to meet customers' needs, and you need to deliver a complete and secure SaaS application. How could you achieve all the above and yet avoid unforeseen IT requirements that add unnecessary cost and complexity? You also want your app to be responsive in any device at any time. In his session at 19th Cloud Expo, Mark Allen, General Manager of...
@ThingsExpo has been named the Top 5 Most Influential Internet of Things Brand by Onalytica in the ‘The Internet of Things Landscape 2015: Top 100 Individuals and Brands.' Onalytica analyzed Twitter conversations around the #IoT debate to uncover the most influential brands and individuals driving the conversation. Onalytica captured data from 56,224 users. The PageRank based methodology they use to extract influencers on a particular topic (tweets mentioning #InternetofThings or #IoT in this ...
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in Embedded and IoT solutions, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 7-9, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/Big Data, HPC and ...
Cloud based infrastructure deployment is becoming more and more appealing to customers, from Fortune 500 companies to SMEs due to its pay-as-you-go model. Enterprise storage vendors are able to reach out to these customers by integrating in cloud based deployments; this needs adaptability and interoperability of the products confirming to cloud standards such as OpenStack, CloudStack, or Azure. As compared to off the shelf commodity storage, enterprise storages by its reliability, high-availabil...
Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT organization pondering its infrastructure strategy. How should your organization enhance its IT framework to enable an Internet of Things implementation? In his session at @ThingsExpo, James Kirkland, Red Hat's Chief Arch...
@DevOpsSummit has been named the ‘Top DevOps Influencer' by iTrend. iTrend processes millions of conversations, tweets, interactions, news articles, press releases, blog posts - and extract meaning form them and analyzes mobile and desktop software platforms used to communicate, various metadata (such as geo location), and automation tools. In overall placement, @DevOpsSummit ranked as the number one ‘DevOps Influencer' followed by @CloudExpo at third, and @MicroservicesE at 24th.
The IoT industry is now at a crossroads, between the fast-paced innovation of technologies and the pending mass adoption by global enterprises. The complexity of combining rapidly evolving technologies and the need to establish practices for market acceleration pose a strong challenge to global enterprises as well as IoT vendors. In his session at @ThingsExpo, Clark Smith, senior product manager for Numerex, will discuss how Numerex, as an experienced, established IoT provider, has embraced a ...