Welcome!

@CloudExpo Authors: Liz McMillan, Yeshim Deniz, Pat Romanski, Jason Bloomberg, Zakia Bouachraoui

Related Topics: @CloudExpo

@CloudExpo: Blog Feed Post

Data Analytics in the Cloud: Two Cool NoSQL ‘Big Data’ Options for the SMB

Some estimates suggest that by 2015 the digital universe will grow to 8 zettabytes of data

Some estimates suggest that by 2015 the digital universe will grow to 8 zettabytes of data (1 Zettabyte = 1,000,000,000,000,000,000,000 bytes).

Much has been written in recent years about “Big Data” and the implications for Information management and data analytics. Simply put, Big data is data that is too large to process using traditional methods. By ‘traditional methods’ we refer to the relational database environments (RDBMS) where data is organized into a set of formally described tables and often accessed using the structured query language (SQL). These systems were designed decades ago when data was much more structured and less accessible.

With the development of web technologies and open source architectures, database management systems have also evolved. The most notable expression of this is MySQL, which is open-source and easily accessible to the beginner, and often bundled into software packages in some variation of the LAMP environment. By contrast, more than half of the digital data today is the unstructured data from social networks, mobile devices, web applications and other similar sources.

While Big Data has become a “big” buzzword in the IT industry today – similar to and, in many ways, a consequence of the Cloud computing phenomenon – and has spun off many kinds of definitions, the essence of the phenomenon can be summed up in the following O’Reilly definition: “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”

The need to understand and manage Big Data has become the bread and butter of IT and engineering teams at major tech companies like Google, Amazon, Facebook, Twitter, as well as other entities that traffic millions of users. But what solutions are available to the SMB, to the average sized business? According to a study released in April 2012 by Techaisle, a survey of over 800 SMBS revealed that 34 percent of US mid-market businesses that are currently using business intelligence are also interested in big data analytics.

In its recent “Hype Cycle for Big Data 2012” Emerging technologies report, the major research firm Gartner states that Column-Store DBMS, Cloud Computing, In-Memory Database Management Systems will be the three most transformational technologies in the next five years.  This same report predicts that Complex Event Processing, Content Analytics, Context-Enriched Services, Hybrid Cloud Computing, Information Capabilities Framework and Telematics are part of the emerging technologies that Gartner also considers to be transformational.  The Hype Cycle for Big Data is shown below:

The time has arrived for SMBs to seriously start thinking about Big Data solutions. As one source has well stated, “It may take a while but eventually any good technology embraced by large enterprises trickles its way down to small and mid-sized businesses in some appropriately modified and re-priced form. It will be no different for modern business analytics tools. The time could be ripe for mid-range customers to start thinking about either modernising their data warehouses or data marts if they are lucky enough to have any, or come up with a plan to install a business analytics platforms if they don’t.”

With this in mind, here are two Important “Big Data” Solutions for the SMB to Keep an Eye on . . .

Google Big Query

BigQuery was introduced in limited preview in November 2011 and made publicly available May 1, 2012, fulfilling Google’s desire to “bring Big Data analytics to all businesses via the cloud.” With Big Query, Google has developed a data analytics solution that offers an easy to use and quickly scalable framework for looking at massive amounts of data in the cloud within a traditional SQL framework. As its tagline suggests, BigQuery allows one to “analyze terabytes of data with just a click of a button.”

The setup process for BigQuery takes less than 5 minutes. Simply Log in to the Google APIs Console and then create a new Google APIs Console project or use an existing project. Navigate to the API Services table and Click on Services on the left-hand sidebar and then Enable BigQuery.

Once BigQuery is enabled, click on the “BigQuery” link choose to manage data through the “web interface” tool

You’ll then be presented with a screen that resembles the basic contours of a traditional MySQL environment, but which is much more simplified. Google has provided a set of publicdata:samples. Click the drop-down and you’ll be presented with a list of these samples. Click on “natality” and then “details”. This brings up the Center for Disease Control (CDC) Birth Vital Statistics for all birth data available in the United States from the 50 States, the District of Columbia, and New York City from 1969 to 2008. In the data set below there are over 137M rows of data!

In order to run a sample query, go back to the homepage for the “BigQuery Browser Tool Tutorial” and select “Run a Query”. You’ll now be presented with a series of sample SQL queries. Choose the one that will select the 10 heaviest children by birth weight that were born in the United States between 1969 and 2008:

SELECT weight_pounds, state, year, gestation_weeks FROM publicdata:samples.natality
ORDER BY weight_pounds DESC LIMIT 10;

Copy and paste the query back into your Compose Query textbox and select “Run Query”. Within seconds, the query extracts the 10 largest birth weights from 137M records from 30 years of data!

What is amazing about the BigQuery interface is the scale of data that is easily presentable to the user in no time. Users can of course create their own tables by importing data from one’s local environment or from Google Cloud Storage. The opportunities for slicing and dicing large data sets are now almost limitless with Google’s BigQuery solution to data analytics.

Bime

BIME (pronounced “beam”) is a French startup that has partnered with Google to create a front-end application for BigQuery that can be used as a business analytics tool. The application runs on Amazon’s Web Services compute cloud and can import data from BigQuery or any variety of cloud and non-cloud sources. With the clever tagline of “Mine Your Own Business.” BIME in its own words “is a revolutionary approach to data analysis and dashboarding. It allows you to analyze your data through interactive data visualizations and create stunning dashboards from the Web.”

The relationship between Google’s BigQuery and BIME is best captured in the screenshot below, which shows how BIME can be used to import and slice and dice the CDC Birth statistics discussed above.

BIME offers a very easy to sign up free 10 day trial with no obligation. Once you sign up for a free account, go to “Create a Connection”

You’ll then need to define a data source from where you wish to import your data set. For very large data sets, you will need to select BimeDB, which requires credit card information to charge either $0.50 or $1.00/hour depending on the size of data sets required

For more conventional data sets, you can import your data sets directly from the desktop. BIME offers an Excel-like environment in which data sets of any size can be sliced and diced and pivoted to derive the desired analytics.

In the case below, we ran a sample Google’s BigQuery CDC Birth statistics table in order to extract the top 500 birth weights from 1969-2008, and then in turn derive the average birth weight for a sampling of five states: Alabama, North Dakota, South Carolina, Texas, and Washington.

Following the 10 day free trial period, BIME users can upgrade to a scaled price plan depending on the data analysis needs of their business.

In conclusion, it bears important mentioning that “Big Data” is Big Business not only for large corporations but for SMBs as well. The discussion above has outlined two major data analytics solutions that are easily accessible and scalable for the everyday small-medium business. Within the emerging technology spectrum, Big Data is critically important and those companies able to easily and efficiently slice and dice this data to provide accurate consumer trends, market forecasts, and offer stakeholders the most up-to-date analysis and metrics, immediately will set themselves apart from other players in the industry. Consider BigQuery and BIME today for your SMB data analytics solutions!

Read the original blog entry...

More Stories By Hovhannes Avoyan

Hovhannes Avoyan is the CEO of PicsArt, Inc.,

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


CloudEXPO Stories
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to advisory roles at startups. He has worked extensively on monetization, SAAS, IoT, ecosystems, partnerships and accelerating growth in new business initiatives.
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments that frequently get lost in the hype. The panel will discuss their perspective on what they see as they key challenges and/or impediments to adoption, and how they see those issues could be resolved or mitigated.
Lori MacVittie is a subject matter expert on emerging technology responsible for outbound evangelism across F5's entire product suite. MacVittie has extensive development and technical architecture experience in both high-tech and enterprise organizations, in addition to network and systems administration expertise. Prior to joining F5, MacVittie was an award-winning technology editor at Network Computing Magazine where she evaluated and tested application-focused technologies including app security and encryption-related solutions. She holds a B.S. in Information and Computing Science from the University of Wisconsin at Green Bay, and an M.S. in Computer Science from Nova Southeastern University, and is an O'Reilly author.
DXWorldEXPO LLC announced today that Nutanix has been named "Platinum Sponsor" of CloudEXPO | DevOpsSUMMIT | DXWorldEXPO New York, which will take place November 12-13, 2018 in New York City. Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that power their business. The Nutanix Enterprise Cloud Platform blends web-scale engineering and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, software-defined solution with rich machine intelligence.
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed by some of the world's largest financial institutions. The company develops and applies innovative machine-learning technologies to big data to predict financial, economic, and world events. The team is a group of passionate technologists, mathematicians, data scientists and programmers in Silicon Valley with over 100 patents to their names. Big Data Federation was incorporated in 2015 and is ...