Welcome!

@CloudExpo Authors: Pat Romanski, Liz McMillan, Elizabeth White, Zakia Bouachraoui, Yeshim Deniz

Related Topics: @CloudExpo, Agile Computing, Release Management

@CloudExpo: Blog Post

25 Years of Big Data: From SQL To The Cloud

Three generations of tools: SQL, MapReduce, Cloudcel

Cloudcel on Ulitzer

Back in 1985, the world was pre-web, data volumes were small, and no one was grappling with information overload. Relational databases and the shiny new SQL query language were just about perfect for this era. At work, 100% of the data required by employees was internal business data, the data was highly structured, and was organized in simple tables. Users would pull data from the database when they realized they needed it.

Fast forward to 2010. Today, everyone is grappling constantly with information overload, both in their work and in their social life. Most data today is unstructured, and most of it is in files, streams or feeds, rather than in structured tables. Many of the data streams are realtime, and constantly changing. At work, most of the data required by employees is now external data, from the web, from analytics tools, and from monitoring systems of all kinds - all kinds of data about customers, partners, employees, competitors, marketing, advertising, pricing, infrastructure, and operations. Today what's needed is smart IT systems that can automatically analyze, filter and push exactly the right data to users in realtime, just when they need it. Oh, and since no one wants to own data processing hardware and software any more, those IT systems should be in the cloud.

So how has the IT industry responded to the dramatic changes brought about first by the web, then more recently by the  realtime social web and the cloud. What tools are now available to users in this new era of Big Data where data volumes are growing exponentially.

From 1985 to 2004, SQL was essentially the only game in town. Around 2004, a number of companies, led by Google, and including Ebay, Yahoo and later Facebook, realized that they required levels of scalability, parallelism, performance and data flexibility that went way beyond what relational databases and SQL could provide. Their solution was to adopt a simple parallel programming framework, MapReduce, in place of SQL. MapReduce and its open source version Hadoop are now widely used to analyze very large data sets.

So what's next? If SQL was the first generation Big Data tool, and MapReduce/Hadoop was the second generation tool, what might a third generation tool look like? To answer this, we need to look at the areas in which MapReduce/Hadoop are weak - those areas are (a) realtime, and (b) ease-of-use. The MapReduce model is optimized for large-scale batch processing. As such, it is not a good fit for the growing number of applications requiring realtime stream processing. The model is also designed for use by experienced programmers, in the case of Hadoop, for use by experienced Java programmers. Unfortunately, the vast majority of those grappling with Big Data challenges today are "non-programmers". They are individuals or business users who rely on tools like Excel spreadsheets for processing their data. And there are a lot of them! Several hundred million Excel users alone.

The third generation of tools for Big Data will therefore need to offer the scalability, parallelism, performance and data flexibility of tools like Hadoop, but also be able to continuously process realtime data streams, and be as easy to use as a spreadsheet. At Cloudscale we've been tackling this challenge. Our Cloudcel service provides the first example of such a third generation Big Data tool.

SQL remains a great tool for handling structured, tabular data, and for transactional applications. MapReduce and Hadoop are great tools if you are a programmer and your task is to process two petabytes of historical data across three thousand servers in less than 24 hours. We now also have a third type of Big Data tool aimed at the much larger number of people who need a simple and easy-to-use, but powerful and scalable cloud-based service for analyzing the huge volumes of data that are now continuously bombarding them in their life and their work.

More Stories By Bill McColl

Bill McColl left Oxford University to found Cloudscale. At Oxford he was Professor of Computer Science, Head of the Parallel Computing Research Center, and Chairman of the Computer Science Faculty. Along with Les Valiant of Harvard, he developed the BSP approach to parallel programming. He has led research, product, and business teams, in a number of areas: massively parallel algorithms and architectures, parallel programming languages and tools, datacenter virtualization, realtime stream processing, big data analytics, and cloud computing. He lives in Palo Alto, CA.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


CloudEXPO Stories
A look across the tech landscape at the disruptive technologies that are increasing in prominence and speculate as to which will be most impactful for communications – namely, AI and Cloud Computing. In his session at 20th Cloud Expo, Curtis Peterson, VP of Operations at RingCentral, highlighted the current challenges of these transformative technologies and shared strategies for preparing your organization for these changes. This “view from the top” outlined the latest trends and developments in AI and Cloud Computing technology innovation for enterprise communications to help you shape your future strategy.
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In their Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, and Mark Lavi, a Nutanix DevOps Solution Architect, explored the ways that Nutanix technologies empower teams to react faster than ever before and connect teams in ways that were either too complex or simply impossible with traditional infrastructures.
Historically, some banking activities such as trading have been relying heavily on analytics and cutting edge algorithmic tools. The coming of age of powerful data analytics solutions combined with the development of intelligent algorithms have created new opportunities for financial institutions. In his session at 20th Cloud Expo, Sebastien Meunier, Head of Digital for North America at Chappuis Halder & Co., discussed how these tools can be leveraged to develop a lasting competitive advantage in priority areas: customer analytics, financial crime prevention, regulatory compliance and risk management.
@CloudEXPO and @ExpoDX, two of the most influential technology events in the world, have hosted hundreds of sponsors and exhibitors since our launch 10 years ago. @CloudEXPO and @ExpoDX New York and Silicon Valley provide a full year of face-to-face marketing opportunities for your company. Each sponsorship and exhibit package comes with pre and post-show marketing programs. By sponsoring and exhibiting in New York and Silicon Valley, you reach a full complement of decision makers and buyers in multiple vertical markets. Our delegate profiles can be located in our show prospectus.
According to the IDC InfoBrief, Sponsored by Nutanix, “Surviving and Thriving in a Multi-cloud World,” multicloud deployments are now the norm for enterprise organizations – less than 30% of customers report using single cloud environments. Most customers leverage different cloud platforms across multiple service providers. The interoperability of data and applications between these varied cloud environments is growing in importance and yet access to hybrid cloud capabilities where a single application runs across clouds remains elusive to most organizations. As companies eagerly seek out ways to make the multi cloud environment a reality, these new updates from Nutanix provide additional capabilities to streamline the implementation of their cloud services deployments.