Welcome!

@CloudExpo Authors: Ashish Nanjiani, Liz McMillan, Yeshim Deniz, Elizabeth White, Pat Romanski

Related Topics: @CloudExpo

@CloudExpo: Article

Is Big Data a Solution in Search of a Problem?

The trigger point of Big Data was when Google published its paper on the “Map-Reduce” algorithm

If you look at the predictions made for 2012, you will find a new entry which was not there last year. Be it Gartner, Forrester or McKenzie – “Big Data” finds a place in the prediction.

So, what is big data? Is it the next path breaking technology which will change everything or is it just a hype which will die down after sometime?

Let us take a realistic look at what the term big data mean and what problem it can solve.

What is "Big Data"?

(The Wikipedia page on Big Data is not that good. The clearest explanation I have found is from O’Reilly Radar – here is the link)

Here is a short explanation.

Big Data is the name given to the classes of technologies that needs to be used when your data volume becomes so much that the RDBMS technologies can no longer handle it.

Big data spans three dimensions (taken from this article of IBM):

  • Variety – Big data extends beyond structured data, including unstructured data of all varieties: text, audio, video, click streams, log files and more.
  • Velocity – Often time-sensitive, big data must be used as it is streaming in to the enterprise in order to maximize its value to the business.
  • Volume – Big data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of information.

In short, if your data volume can be handled efficiently by RDBMS you NEED NOT worry about Big Data.

How did it all start?

With the advent of cloud computing which provided easy access to massive amount distributed computing power there was a realization RDBMS cannot be effectively parallelized. In fact CAP theorem states that Consistency, Availability & Partition Tolerance cannot simultaneously be guaranteed. This led to a No-SQL movement and multiple non-relational databases sprang up.

Trigger Point of Big Data happened when Google published the paper on the “Map-Reduce” algorithm. It involves processing of highly distributable problems across huge datasets using a large number of computers. Map-Reduce is at the heart of Google’s search engine.

Takeoff happened when Apache open source “Hadoop” project which created its own implementation of Map-Reduce. The largest Hadoop implementation is probably at Facebook.

In short: Big Data requires large DISTRIBUTED processing power.

Why would you want to process so much data?

There are 3 basic assumptions which are driving the big data movement:

  1. Faster analysis of larger operational data will help you make better decision
  2. More in-depth analysis of customer data will guide you to better customer segmentation
  3. Insight into larger data set will help you come up with innovative product design

Companies that have successfully leveraged this are Google, Facebook, Amazon, Walmart, Yahoo etc.

In short – the ASSUMPTION is that more data and faster analytics will lead to more innovation and better decision making.

Three Prerequisites for leveraging Big Data

Let us assume that your data volume is large enough and you have access to enough distributed processing power. Will that be sufficient for you to venture into big data?

No … you need three more things.

  1. Business problem which you think that the data at your disposal can help to resolve
  2. Set of questions to be answered through data analysis
  3. Algorithm to analyze the data – this is the domain of the new field Data Science

Big Data will be useful only if you are equipped with all these.

Therefore, for most of us, Big Data is a solution which is in search of a problem.

Related Articles

More Stories By Udayan Banerjee

Udayan Banerjee is CTO at NIIT Technologies Ltd, an IT industry veteran with more than 30 years' experience. He blogs at http://setandbma.wordpress.com.
The blog focuses on emerging technologies like cloud computing, mobile computing, social media aka web 2.0 etc. It also contains stuff about agile methodology and trends in architecture. It is a world view seen through the lens of a software service provider based out of Bangalore and serving clients across the world. The focus is mostly on...

  • Keep the hype out and project a realistic picture
  • Uncover trends not very apparent
  • Draw conclusion from real life experience
  • Point out fallacy & discrepancy when I see them
  • Talk about trends which I find interesting
Google

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
SYS-CON Events announced today that SourceForge has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SourceForge is the largest, most trusted destination for Open Source Software development, collaboration, discovery and download on the web serving over 32 million viewers, 150 million downloads and over 460,000 active development projects each and every month.
We build IoT infrastructure products - when you have to integrate different devices, different systems and cloud you have to build an application to do that but we eliminate the need to build an application. Our products can integrate any device, any system, any cloud regardless of protocol," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA
SYS-CON Events announced today that CHEETAH Training & Innovation will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CHEETAH Training & Innovation is a cloud consulting and IT training firm specializing in improving clients cloud strategies and infrastructures for medium to large companies.
"Tintri focuses on the Ops side of the DevOps, which basically is pushing more and more of the accessibility of the infrastructure to the developers and trying to get behind the scenes," explained Dhiraj Sehgal of Tintri in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Cloud applications are seeing a deluge of requests to support the exploding advanced analytics market. “Open analytics” is the emerging strategy to deliver that data through an open data access layer, in the cloud, to be directly consumed by external analytics tools and popular programming languages. An increasing number of data engineers and data scientists use a variety of platforms and advanced analytics languages such as SAS, R, Python and Java, as well as frameworks such as Hadoop and Spark...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
SYS-CON Events announced today that TMC has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo and Big Data at Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Global buyers rely on TMC’s content-driven marketplaces to make purchase decisions and navigate markets. Learn how we can help you reach your marketing goals.
Both SaaS vendors and SaaS buyers are going “all-in” to hyperscale IaaS platforms such as AWS, which is disrupting the SaaS value proposition. Why should the enterprise SaaS consumer pay for the SaaS service if their data is resident in adjacent AWS S3 buckets? If both SaaS sellers and buyers are using the same cloud tools, automation and pay-per-transaction model offered by IaaS platforms, then why not host the “shrink-wrapped” software in the customers’ cloud? Further, serverless computing, cl...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organi...
"MobiDev is a Ukraine-based software development company. We do mobile development, and we're specialists in that. But we do full stack software development for entrepreneurs, for emerging companies, and for enterprise ventures," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
"We focus on composable infrastructure. Composable infrastructure has been named by companies like Gartner as the evolution of the IT infrastructure where everything is now driven by software," explained Bruno Andrade, CEO and Founder of HTBase, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
It is ironic, but perhaps not unexpected, that many organizations who want the benefits of using an Agile approach to deliver software use a waterfall approach to adopting Agile practices: they form plans, they set milestones, and they measure progress by how many teams they have engaged. Old habits die hard, but like most waterfall software projects, most waterfall-style Agile adoption efforts fail to produce the results desired. The problem is that to get the results they want, they have to ch...
Wooed by the promise of faster innovation, lower TCO, and greater agility, businesses of every shape and size have embraced the cloud at every layer of the IT stack – from apps to file sharing to infrastructure. The typical organization currently uses more than a dozen sanctioned cloud apps and will shift more than half of all workloads to the cloud by 2018. Such cloud investments have delivered measurable benefits. But they’ve also resulted in some unintended side-effects: complexity and risk. ...
Cloud applications are seeing a deluge of requests to support the exploding advanced analytics market. “Open analytics” is the emerging strategy to deliver that data through an open data access layer, in the cloud, to be directly consumed by external analytics tools and popular programming languages. An increasing number of data engineers and data scientists use a variety of platforms and advanced analytics languages such as SAS, R, Python and Java, as well as frameworks such as Hadoop and Spark...
In 2014, Amazon announced a new form of compute called Lambda. We didn't know it at the time, but this represented a fundamental shift in what we expect from cloud computing. Now, all of the major cloud computing vendors want to take part in this disruptive technology. In his session at 20th Cloud Expo, Doug Vanderweide, an instructor at Linux Academy, discussed why major players like AWS, Microsoft Azure, IBM Bluemix, and Google Cloud Platform are all trying to sidestep VMs and containers wit...
No hype cycles or predictions of zillions of things here. IoT is big. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, Associate Partner at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He discussed the evaluation of communication standards and IoT messaging protocols, data analytics considerations, edge-to-cloud tec...
When growing capacity and power in the data center, the architectural trade-offs between server scale-up vs. scale-out continue to be debated. Both approaches are valid: scale-out adds multiple, smaller servers running in a distributed computing model, while scale-up adds fewer, more powerful servers that are capable of running larger workloads. It’s worth noting that there are additional, unique advantages that scale-up architectures offer. One big advantage is large memory and compute capacity...