Welcome!

@CloudExpo Authors: Elizabeth White, Yeshim Deniz, Harry Trott, Pat Romanski, William Schmarzo

Related Topics: @CloudExpo

@CloudExpo: Blog Feed Post

Data Analytics in the Cloud: Two Cool NoSQL ‘Big Data’ Options for the SMB

Some estimates suggest that by 2015 the digital universe will grow to 8 zettabytes of data

Some estimates suggest that by 2015 the digital universe will grow to 8 zettabytes of data (1 Zettabyte = 1,000,000,000,000,000,000,000 bytes).

Much has been written in recent years about “Big Data” and the implications for Information management and data analytics. Simply put, Big data is data that is too large to process using traditional methods. By ‘traditional methods’ we refer to the relational database environments (RDBMS) where data is organized into a set of formally described tables and often accessed using the structured query language (SQL). These systems were designed decades ago when data was much more structured and less accessible.

With the development of web technologies and open source architectures, database management systems have also evolved. The most notable expression of this is MySQL, which is open-source and easily accessible to the beginner, and often bundled into software packages in some variation of the LAMP environment. By contrast, more than half of the digital data today is the unstructured data from social networks, mobile devices, web applications and other similar sources.

While Big Data has become a “big” buzzword in the IT industry today – similar to and, in many ways, a consequence of the Cloud computing phenomenon – and has spun off many kinds of definitions, the essence of the phenomenon can be summed up in the following O’Reilly definition: “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”

The need to understand and manage Big Data has become the bread and butter of IT and engineering teams at major tech companies like Google, Amazon, Facebook, Twitter, as well as other entities that traffic millions of users. But what solutions are available to the SMB, to the average sized business? According to a study released in April 2012 by Techaisle, a survey of over 800 SMBS revealed that 34 percent of US mid-market businesses that are currently using business intelligence are also interested in big data analytics.

In its recent “Hype Cycle for Big Data 2012” Emerging technologies report, the major research firm Gartner states that Column-Store DBMS, Cloud Computing, In-Memory Database Management Systems will be the three most transformational technologies in the next five years.  This same report predicts that Complex Event Processing, Content Analytics, Context-Enriched Services, Hybrid Cloud Computing, Information Capabilities Framework and Telematics are part of the emerging technologies that Gartner also considers to be transformational.  The Hype Cycle for Big Data is shown below:

The time has arrived for SMBs to seriously start thinking about Big Data solutions. As one source has well stated, “It may take a while but eventually any good technology embraced by large enterprises trickles its way down to small and mid-sized businesses in some appropriately modified and re-priced form. It will be no different for modern business analytics tools. The time could be ripe for mid-range customers to start thinking about either modernising their data warehouses or data marts if they are lucky enough to have any, or come up with a plan to install a business analytics platforms if they don’t.”

With this in mind, here are two Important “Big Data” Solutions for the SMB to Keep an Eye on . . .

Google Big Query

BigQuery was introduced in limited preview in November 2011 and made publicly available May 1, 2012, fulfilling Google’s desire to “bring Big Data analytics to all businesses via the cloud.” With Big Query, Google has developed a data analytics solution that offers an easy to use and quickly scalable framework for looking at massive amounts of data in the cloud within a traditional SQL framework. As its tagline suggests, BigQuery allows one to “analyze terabytes of data with just a click of a button.”

The setup process for BigQuery takes less than 5 minutes. Simply Log in to the Google APIs Console and then create a new Google APIs Console project or use an existing project. Navigate to the API Services table and Click on Services on the left-hand sidebar and then Enable BigQuery.

Once BigQuery is enabled, click on the “BigQuery” link choose to manage data through the “web interface” tool

You’ll then be presented with a screen that resembles the basic contours of a traditional MySQL environment, but which is much more simplified. Google has provided a set of publicdata:samples. Click the drop-down and you’ll be presented with a list of these samples. Click on “natality” and then “details”. This brings up the Center for Disease Control (CDC) Birth Vital Statistics for all birth data available in the United States from the 50 States, the District of Columbia, and New York City from 1969 to 2008. In the data set below there are over 137M rows of data!

In order to run a sample query, go back to the homepage for the “BigQuery Browser Tool Tutorial” and select “Run a Query”. You’ll now be presented with a series of sample SQL queries. Choose the one that will select the 10 heaviest children by birth weight that were born in the United States between 1969 and 2008:

SELECT weight_pounds, state, year, gestation_weeks FROM publicdata:samples.natality
ORDER BY weight_pounds DESC LIMIT 10;

Copy and paste the query back into your Compose Query textbox and select “Run Query”. Within seconds, the query extracts the 10 largest birth weights from 137M records from 30 years of data!

What is amazing about the BigQuery interface is the scale of data that is easily presentable to the user in no time. Users can of course create their own tables by importing data from one’s local environment or from Google Cloud Storage. The opportunities for slicing and dicing large data sets are now almost limitless with Google’s BigQuery solution to data analytics.

Bime

BIME (pronounced “beam”) is a French startup that has partnered with Google to create a front-end application for BigQuery that can be used as a business analytics tool. The application runs on Amazon’s Web Services compute cloud and can import data from BigQuery or any variety of cloud and non-cloud sources. With the clever tagline of “Mine Your Own Business.” BIME in its own words “is a revolutionary approach to data analysis and dashboarding. It allows you to analyze your data through interactive data visualizations and create stunning dashboards from the Web.”

The relationship between Google’s BigQuery and BIME is best captured in the screenshot below, which shows how BIME can be used to import and slice and dice the CDC Birth statistics discussed above.

BIME offers a very easy to sign up free 10 day trial with no obligation. Once you sign up for a free account, go to “Create a Connection”

You’ll then need to define a data source from where you wish to import your data set. For very large data sets, you will need to select BimeDB, which requires credit card information to charge either $0.50 or $1.00/hour depending on the size of data sets required

For more conventional data sets, you can import your data sets directly from the desktop. BIME offers an Excel-like environment in which data sets of any size can be sliced and diced and pivoted to derive the desired analytics.

In the case below, we ran a sample Google’s BigQuery CDC Birth statistics table in order to extract the top 500 birth weights from 1969-2008, and then in turn derive the average birth weight for a sampling of five states: Alabama, North Dakota, South Carolina, Texas, and Washington.

Following the 10 day free trial period, BIME users can upgrade to a scaled price plan depending on the data analysis needs of their business.

In conclusion, it bears important mentioning that “Big Data” is Big Business not only for large corporations but for SMBs as well. The discussion above has outlined two major data analytics solutions that are easily accessible and scalable for the everyday small-medium business. Within the emerging technology spectrum, Big Data is critically important and those companies able to easily and efficiently slice and dice this data to provide accurate consumer trends, market forecasts, and offer stakeholders the most up-to-date analysis and metrics, immediately will set themselves apart from other players in the industry. Consider BigQuery and BIME today for your SMB data analytics solutions!

Read the original blog entry...

More Stories By Hovhannes Avoyan

Hovhannes Avoyan is the CEO of PicsArt, Inc.,

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution and join Akvelon expert and IoT industry leader, Sergey Grebnov, in his session at @ThingsExpo, for an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
Because IoT devices are deployed in mission-critical environments more than ever before, it’s increasingly imperative they be truly smart. IoT sensors simply stockpiling data isn’t useful. IoT must be artificially and naturally intelligent in order to provide more value In his session at @ThingsExpo, John Crupi, Vice President and Engineering System Architect at Greenwave Systems, will discuss how IoT artificial intelligence (AI) can be carried out via edge analytics and machine learning techn...
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...
FinTechs use the cloud to operate at the speed and scale of digital financial activity, but are often hindered by the complexity of managing security and compliance in the cloud. In his session at 20th Cloud Expo, Sesh Murthy, co-founder and CTO of Cloud Raxak, showed how proactive and automated cloud security enables FinTechs to leverage the cloud to achieve their business goals. Through business-driven cloud security, FinTechs can speed time-to-market, diminish risk and costs, maintain continu...
Existing Big Data solutions are mainly focused on the discovery and analysis of data. The solutions are scalable and highly available but tedious when swapping in and swapping out occurs in disarray and thrashing takes place. The resolution for thrashing through machine learning algorithms and support nomenclature is through simple techniques. Organizations that have been collecting large customer data are increasingly seeing the need to use the data for swapping in and out and thrashing occurs ...
As many know, the first generation of Cloud Management Platform (CMP) solutions were designed for managing virtual infrastructure (IaaS) and traditional applications. But that’s no longer enough to satisfy evolving and complex business requirements. In his session at 21st Cloud Expo, Scott Davis, Embotics CTO, will explore how next-generation CMPs ensure organizations can manage cloud-native and microservice-based application architectures, while also facilitating agile DevOps methodology. He wi...
SYS-CON Events announced today that CA Technologies has been named "Platinum Sponsor" of SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business - from apparel to energy - is being rewritten by software. From planning to development to management to security, CA creates software that fuels transformation for companies in the applic...
From 2013, NTT Communications has been providing cPaaS service, SkyWay. Its customer’s expectations for leveraging WebRTC technology are not only typical real-time communication use cases such as Web conference, remote education, but also IoT use cases such as remote camera monitoring, smart-glass, and robotic. Because of this, NTT Communications has numerous IoT business use-cases that its customers are developing on top of PaaS. WebRTC will lead IoT businesses to be more innovative and address...
An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics ...
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across business networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost as well as advance trade. Are you curious about how Blockchain is built for business? In her session at 21st Cloud Expo, René Bostic, Technical VP of the IBM Cloud Unit in North America, will discuss th...
yperConvergence came to market with the objective of being simple, flexible and to help drive down operating expenses. It reduced the footprint by bundling the compute/storage/network into one box. This brought a new set of challenges as the HyperConverged vendors are very focused on their own proprietary building blocks. If you want to scale in a certain way, let’s say you identified a need for more storage and want to add a device that is not sold by the HyperConverged vendor, forget about it....
SYS-CON Events announced today that App2Cloud will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. App2Cloud is an online Platform, specializing in migrating legacy applications to any Cloud Providers (AWS, Azure, Google Cloud).
While some vendors scramble to create and sell you a fancy solution for monitoring your spanking new Amazon Lambdas, hear how you can do it on the cheap using just built-in Java APIs yourself. By exploiting a little-known fact that Lambdas aren’t exactly single-threaded, you can effectively identify hot spots in your serverless code. In his session at @DevOpsSummit at 21st Cloud Expo, Dave Martin, Product owner at CA Technologies, will give a live demonstration and code walkthrough, showing how ...
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex software systems for startups and enterprises. Since 2009 it has grown from a small group of passionate engineers and business...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
SYS-CON Events announced today that Dasher Technologies will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Dasher Technologies, Inc. ® is a premier IT solution provider that delivers expert technical resources along with trusted account executives to architect and deliver complete IT solutions and services to help our clients execute their goals, plans and objectives. Since 1999, we'v...
SYS-CON Events announced today that Ayehu will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ayehu provides IT Process Automation & Orchestration solutions for IT and Security professionals to identify and resolve critical incidents and enable rapid containment, eradication, and recovery from cyber security breaches. Ayehu provides customers greater control over IT infrastructure throu...
Cloud adoption is often driven by a desire to increase efficiency, boost agility and save money. All too often, however, the reality involves unpredictable cost spikes and lack of oversight due to resource limitations. In his session at 20th Cloud Expo, Joe Kinsella, CTO and Founder of CloudHealth Technologies, tackled the question: “How do you build a fully optimized cloud?” He will examine: Why TCO is critical to achieving cloud success – and why attendees should be thinking holistically ab...
SYS-CON Events announced today that Elastifile will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Elastifile Cloud File System (ECFS) is software-defined data infrastructure designed for seamless and efficient management of dynamic workloads across heterogeneous environments. Elastifile provides the architecture needed to optimize your hybrid cloud environment, by facilitating efficient...
SYS-CON Events announced today that Grape Up will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company specializing in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the U.S. and Europe, Grape Up works with a variety of customers from emergi...