Welcome!

@CloudExpo Authors: Liz McMillan, Elizabeth White, XebiaLabs Blog, Roger Strukhoff, Pat Romanski

Related Topics: @CloudExpo

@CloudExpo: Blog Post

Introducing Big Data

When we say Big Data, we enter the petabyte space (1000 terabytes)

The phrase “Big Data” is thrown around a lot these days. What exactly is referred to by this phrase? When I was part of IBM’s DB2 development team, the largest size limit of a DB2 Table was 64 Gigabytes (GB) and I thought who on earth can use this size of a database. Thirty years later, that number looks so small. Now you can buy a 1 Terabyte external drive for less than $100.

Let us start with a level set on the unit of storage. In multiples of 1000, we go from Byte – Kilobyte (KB) – Megabyte (MB) – Gigabyte (GB) – Terabyte (TB) – Petabyte (PB) – Exabyte (EB) – Zettabyte (ZB) – Yottabyte (YB). The last one YB is 10 to the power of 24. A typed page is 2KB. The entire book collection at the US Library of Congress is 15TB. The amount of data processed in one hour at Google is 1PB. The total amount of information in existence is around 1.27ZB. Now you get some context to these numbers.

When we say Big Data, we enter the petabyte space (1000 Terabytes). There is talk of “personal petabyte” to store all your audio, video, and pictures. The cost has come down from $2M in 2002 to $2K in 2012 – real Moore’s law in disk storage technology here. This is not the stuff for current commercial database products such as DB2 or Oracle or SQLServer. Such RDBMS’s handle maximum of 10 to 100 Terabyte sizes. Anything bigger would cause serious performance nightmares. These large databases are mostly in the decision support and data warehousing applications. Walmart is known to have its main retail transaction data warehouse at 100 plus terabytes in a Teradata DBMS system.

Most of the growth in data is in “files”, not in DBMS. Now we see huge volumes of data in social networking sites like Facebook. At the beginning of 2010, Facebook was handling more than 4TB per day (compressed). Now that it has gone to 750M users, that number is at least 50% more. The new Zuck’s (Zuckerberg) law is , “Shared contents double every 24 months”. The question is how to deal with such volumes.

Google pioneered the algorithm called MapReduce to process massive amounts of data via parallel processing through hundreds of thousands of commodity servers. A simple Google query you type, probably touches 700 to 1000 servers to yield that half-second response time. MapReduce was made an open source under the Apache umbrella and was released as Hadoop (by Doug Cutting, former Xerox Parc, Apple, now at Cloudera). Hadoop has a file store called HDFS besides the MapReduce computational process. Hadoop therefore is a “flexible and available architecture for large scale computation and data processing on a network of commodity servers”. What is Redhat to Linux is Cloudera (new VC funded company) to Hadoop.

While Hadoop is becoming a defacto standard for big data, it’s pedigree is batch. For near-real-time analytics, better answers are needed. Yahoo, for example, has a real time analytics project called S4. Several other innovations are happening in this area of realtime or near realtime analytics. Visualization is another hot area for big data.

Big Data offers many opportunities for innovation in next few years.

Read the original blog entry...

More Stories By Jnan Dash

Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
SYS-CON Events announced today that Pulzze Systems will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Pulzze Systems, Inc. provides infrastructure products for the Internet of Things to enable any connected device and system to carry out matched operations without programming. For more information, visit http://www.pulzzesystems.com.
I'm a lonely sensor. I spend all day telling the world how I'm feeling, but none of the other sensors seem to care. I want to be connected. I want to build relationships with other sensors to be more useful for my human. I want my human to understand that when my friends next door are too hot for a while, I'll soon be flaming. And when all my friends go outside without me, I may be left behind. Don't just log my data; use the relationship graph. In his session at @ThingsExpo, Ryan Boyd, Engi...
The Transparent Cloud-computing Consortium (abbreviation: T-Cloud Consortium) will conduct research activities into changes in the computing model as a result of collaboration between "device" and "cloud" and the creation of new value and markets through organic data processing High speed and high quality networks, and dramatic improvements in computer processing capabilities, have greatly changed the nature of applications and made the storing and processing of data on the network commonplace.
From wearable activity trackers to fantasy e-sports, data and technology are transforming the way athletes train for the game and fans engage with their teams. In his session at @ThingsExpo, will present key data findings from leading sports organizations San Francisco 49ers, Orlando Magic NBA team. By utilizing data analytics these sports orgs have recognized new revenue streams, doubled its fan base and streamlined costs at its stadiums. John Paul is the CEO and Founder of VenueNext. Prior ...
In his general session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, discussed cloud as a ‘better data center’ and how it adds new capacity (faster) and improves application availability (redundancy). The cloud is a ‘Dynamic Tool for Dynamic Apps’ and resource allocation is an integral part of your application architecture, so use only the resources you need and allocate /de-allocate resources on the fly.
Using new techniques of information modeling, indexing, and processing, new cloud-based systems can support cloud-based workloads previously not possible for high-throughput insurance, banking, and case-based applications. In his session at 18th Cloud Expo, John Newton, CTO, Founder and Chairman of Alfresco, described how to scale cloud-based content management repositories to store, manage, and retrieve billions of documents and related information with fast and linear scalability. He addres...
SYS-CON Events announced today that Numerex Corp, a leading provider of managed enterprise solutions enabling the Internet of Things (IoT), will exhibit at the 19th International Cloud Expo | @ThingsExpo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Numerex Corp. (NASDAQ:NMRX) is a leading provider of managed enterprise solutions enabling the Internet of Things (IoT). The Company's solutions produce new revenue streams or create operating...
WebRTC adoption has generated a wave of creative uses of communications and collaboration through websites, sales apps, customer care and business applications. As WebRTC has become more mainstream it has evolved to use cases beyond the original peer-to-peer case, which has led to a repeating requirement for interoperability with existing infrastructures. In his session at @ThingsExpo, Graham Holt, Executive Vice President of Daitan Group, will cover implementation examples that have enabled ea...
SYS-CON Events announced today that Tintri Inc., a leading producer of VM-aware storage (VAS) for virtualization and cloud environments, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Tintri VM-aware storage is the simplest for virtualized applications and cloud. Organizations including GE, Toyota, United Healthcare, NASA and 6 of the Fortune 15 have said “No to LUNs.” With Tintri they mana...
IoT offers a value of almost $4 trillion to the manufacturing industry through platforms that can improve margins, optimize operations & drive high performance work teams. By using IoT technologies as a foundation, manufacturing customers are integrating worker safety with manufacturing systems, driving deep collaboration and utilizing analytics to exponentially increased per-unit margins. However, as Benoit Lheureux, the VP for Research at Gartner points out, “IoT project implementers often ...
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, will compare the Jevons Paradox to modern-day enterprise IT, e...
SYS-CON Events announced today the Enterprise IoT Bootcamp, being held November 1-2, 2016, in conjunction with 19th Cloud Expo | @ThingsExpo at the Santa Clara Convention Center in Santa Clara, CA. Combined with real-world scenarios and use cases, the Enterprise IoT Bootcamp is not just based on presentations but with hands-on demos and detailed walkthroughs. We will introduce you to a variety of real world use cases prototyped using Arduino, Raspberry Pi, BeagleBone, Spark, and Intel Edison. Y...
Complete Internet of Things (IoT) embedded device security is not just about the device but involves the entire product’s identity, data and control integrity, and services traversing the cloud. A device can no longer be looked at as an island; it is a part of a system. In fact, given the cross-domain interactions enabled by IoT it could be a part of many systems. Also, depending on where the device is deployed, for example, in the office building versus a factory floor or oil field, security ha...
Is your aging software platform suffering from technical debt while the market changes and demands new solutions at a faster clip? It’s a bold move, but you might consider walking away from your core platform and starting fresh. ReadyTalk did exactly that. In his General Session at 19th Cloud Expo, Michael Chambliss, Head of Engineering at ReadyTalk, will discuss why and how ReadyTalk diverted from healthy revenue and over a decade of audio conferencing product development to start an innovati...
Fifty billion connected devices and still no winning protocols standards. HTTP, WebSockets, MQTT, and CoAP seem to be leading in the IoT protocol race at the moment but many more protocols are getting introduced on a regular basis. Each protocol has its pros and cons depending on the nature of the communications. Does there really need to be only one protocol to rule them all? Of course not. In his session at @ThingsExpo, Chris Matthieu, co-founder and CTO of Octoblu, walk you through how Oct...
SYS-CON Events announced today that Bsquare has been named “Silver Sponsor” of SYS-CON's @ThingsExpo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. For more than two decades, Bsquare has helped its customers extract business value from a broad array of physical assets by making them intelligent, connecting them, and using the data they generate to optimize business processes.
Whether they’re located in a public, private, or hybrid cloud environment, cloud technologies are constantly evolving. While the innovation is exciting, the end mission of delivering business value and rapidly producing incremental product features is paramount. In his session at @DevOpsSummit at 19th Cloud Expo, Kiran Chitturi, CTO Architect at Sungard AS, will discuss DevOps culture, its evolution of frameworks and technologies, and how it is achieving maturity. He will also cover various st...
Identity is in everything and customers are looking to their providers to ensure the security of their identities, transactions and data. With the increased reliance on cloud-based services, service providers must build security and trust into their offerings, adding value to customers and improving the user experience. Making identity, security and privacy easy for customers provides a unique advantage over the competition.
There are several IoTs: the Industrial Internet, Consumer Wearables, Wearables and Healthcare, Supply Chains, and the movement toward Smart Grids, Cities, Regions, and Nations. There are competing communications standards every step of the way, a bewildering array of sensors and devices, and an entire world of competing data analytics platforms. To some this appears to be chaos. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, Bradley Holt, Developer Advocate a...
SYS-CON Events announced today that Niagara Networks will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Niagara Networks offers the highest port-density systems, and the most complete Next-Generation Network Visibility systems including Network Packet Brokers, Bypass Switches, and Network TAPs.