Welcome!

@CloudExpo Authors: Ed Featherston, Rostyslav Demush, Jamie Madison, Jason Bloomberg, Greg Pierce

Related Topics: @CloudExpo

@CloudExpo: Blog Feed Post

Data Analytics in the Cloud: Two Cool NoSQL ‘Big Data’ Options for the SMB

Some estimates suggest that by 2015 the digital universe will grow to 8 zettabytes of data

Some estimates suggest that by 2015 the digital universe will grow to 8 zettabytes of data (1 Zettabyte = 1,000,000,000,000,000,000,000 bytes).

Much has been written in recent years about “Big Data” and the implications for Information management and data analytics. Simply put, Big data is data that is too large to process using traditional methods. By ‘traditional methods’ we refer to the relational database environments (RDBMS) where data is organized into a set of formally described tables and often accessed using the structured query language (SQL). These systems were designed decades ago when data was much more structured and less accessible.

With the development of web technologies and open source architectures, database management systems have also evolved. The most notable expression of this is MySQL, which is open-source and easily accessible to the beginner, and often bundled into software packages in some variation of the LAMP environment. By contrast, more than half of the digital data today is the unstructured data from social networks, mobile devices, web applications and other similar sources.

While Big Data has become a “big” buzzword in the IT industry today – similar to and, in many ways, a consequence of the Cloud computing phenomenon – and has spun off many kinds of definitions, the essence of the phenomenon can be summed up in the following O’Reilly definition: “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.”

The need to understand and manage Big Data has become the bread and butter of IT and engineering teams at major tech companies like Google, Amazon, Facebook, Twitter, as well as other entities that traffic millions of users. But what solutions are available to the SMB, to the average sized business? According to a study released in April 2012 by Techaisle, a survey of over 800 SMBS revealed that 34 percent of US mid-market businesses that are currently using business intelligence are also interested in big data analytics.

In its recent “Hype Cycle for Big Data 2012” Emerging technologies report, the major research firm Gartner states that Column-Store DBMS, Cloud Computing, In-Memory Database Management Systems will be the three most transformational technologies in the next five years.  This same report predicts that Complex Event Processing, Content Analytics, Context-Enriched Services, Hybrid Cloud Computing, Information Capabilities Framework and Telematics are part of the emerging technologies that Gartner also considers to be transformational.  The Hype Cycle for Big Data is shown below:

The time has arrived for SMBs to seriously start thinking about Big Data solutions. As one source has well stated, “It may take a while but eventually any good technology embraced by large enterprises trickles its way down to small and mid-sized businesses in some appropriately modified and re-priced form. It will be no different for modern business analytics tools. The time could be ripe for mid-range customers to start thinking about either modernising their data warehouses or data marts if they are lucky enough to have any, or come up with a plan to install a business analytics platforms if they don’t.”

With this in mind, here are two Important “Big Data” Solutions for the SMB to Keep an Eye on . . .

Google Big Query

BigQuery was introduced in limited preview in November 2011 and made publicly available May 1, 2012, fulfilling Google’s desire to “bring Big Data analytics to all businesses via the cloud.” With Big Query, Google has developed a data analytics solution that offers an easy to use and quickly scalable framework for looking at massive amounts of data in the cloud within a traditional SQL framework. As its tagline suggests, BigQuery allows one to “analyze terabytes of data with just a click of a button.”

The setup process for BigQuery takes less than 5 minutes. Simply Log in to the Google APIs Console and then create a new Google APIs Console project or use an existing project. Navigate to the API Services table and Click on Services on the left-hand sidebar and then Enable BigQuery.

Once BigQuery is enabled, click on the “BigQuery” link choose to manage data through the “web interface” tool

You’ll then be presented with a screen that resembles the basic contours of a traditional MySQL environment, but which is much more simplified. Google has provided a set of publicdata:samples. Click the drop-down and you’ll be presented with a list of these samples. Click on “natality” and then “details”. This brings up the Center for Disease Control (CDC) Birth Vital Statistics for all birth data available in the United States from the 50 States, the District of Columbia, and New York City from 1969 to 2008. In the data set below there are over 137M rows of data!

In order to run a sample query, go back to the homepage for the “BigQuery Browser Tool Tutorial” and select “Run a Query”. You’ll now be presented with a series of sample SQL queries. Choose the one that will select the 10 heaviest children by birth weight that were born in the United States between 1969 and 2008:

SELECT weight_pounds, state, year, gestation_weeks FROM publicdata:samples.natality
ORDER BY weight_pounds DESC LIMIT 10;

Copy and paste the query back into your Compose Query textbox and select “Run Query”. Within seconds, the query extracts the 10 largest birth weights from 137M records from 30 years of data!

What is amazing about the BigQuery interface is the scale of data that is easily presentable to the user in no time. Users can of course create their own tables by importing data from one’s local environment or from Google Cloud Storage. The opportunities for slicing and dicing large data sets are now almost limitless with Google’s BigQuery solution to data analytics.

Bime

BIME (pronounced “beam”) is a French startup that has partnered with Google to create a front-end application for BigQuery that can be used as a business analytics tool. The application runs on Amazon’s Web Services compute cloud and can import data from BigQuery or any variety of cloud and non-cloud sources. With the clever tagline of “Mine Your Own Business.” BIME in its own words “is a revolutionary approach to data analysis and dashboarding. It allows you to analyze your data through interactive data visualizations and create stunning dashboards from the Web.”

The relationship between Google’s BigQuery and BIME is best captured in the screenshot below, which shows how BIME can be used to import and slice and dice the CDC Birth statistics discussed above.

BIME offers a very easy to sign up free 10 day trial with no obligation. Once you sign up for a free account, go to “Create a Connection”

You’ll then need to define a data source from where you wish to import your data set. For very large data sets, you will need to select BimeDB, which requires credit card information to charge either $0.50 or $1.00/hour depending on the size of data sets required

For more conventional data sets, you can import your data sets directly from the desktop. BIME offers an Excel-like environment in which data sets of any size can be sliced and diced and pivoted to derive the desired analytics.

In the case below, we ran a sample Google’s BigQuery CDC Birth statistics table in order to extract the top 500 birth weights from 1969-2008, and then in turn derive the average birth weight for a sampling of five states: Alabama, North Dakota, South Carolina, Texas, and Washington.

Following the 10 day free trial period, BIME users can upgrade to a scaled price plan depending on the data analysis needs of their business.

In conclusion, it bears important mentioning that “Big Data” is Big Business not only for large corporations but for SMBs as well. The discussion above has outlined two major data analytics solutions that are easily accessible and scalable for the everyday small-medium business. Within the emerging technology spectrum, Big Data is critically important and those companies able to easily and efficiently slice and dice this data to provide accurate consumer trends, market forecasts, and offer stakeholders the most up-to-date analysis and metrics, immediately will set themselves apart from other players in the industry. Consider BigQuery and BIME today for your SMB data analytics solutions!

Read the original blog entry...

More Stories By Hovhannes Avoyan

Hovhannes Avoyan is the CEO of PicsArt, Inc.,

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
DX World EXPO, LLC, a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Conference Guru has been named “Media Sponsor” of the 22nd International Cloud Expo, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organizers to pass great deals to gre...
DevOps is under attack because developers don’t want to mess with infrastructure. They will happily own their code into production, but want to use platforms instead of raw automation. That’s changing the landscape that we understand as DevOps with both architecture concepts (CloudNative) and process redefinition (SRE). Rob Hirschfeld’s recent work in Kubernetes operations has led to the conclusion that containers and related platforms have changed the way we should be thinking about DevOps and...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
Companies are harnessing data in ways we once associated with science fiction. Analysts have access to a plethora of visualization and reporting tools, but considering the vast amount of data businesses collect and limitations of CPUs, end users are forced to design their structures and systems with limitations. Until now. As the cloud toolkit to analyze data has evolved, GPUs have stepped in to massively parallel SQL, visualization and machine learning.
"Evatronix provides design services to companies that need to integrate the IoT technology in their products but they don't necessarily have the expertise, knowledge and design team to do so," explained Adam Morawiec, VP of Business Development at Evatronix, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. Thi...
"Infoblox does DNS, DHCP and IP address management for not only enterprise networks but cloud networks as well. Customers are looking for a single platform that can extend not only in their private enterprise environment but private cloud, public cloud, tracking all the IP space and everything that is going on in that environment," explained Steve Salo, Principal Systems Engineer at Infoblox, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventio...
In his session at 21st Cloud Expo, James Henry, Co-CEO/CTO of Calgary Scientific Inc., introduced you to the challenges, solutions and benefits of training AI systems to solve visual problems with an emphasis on improving AIs with continuous training in the field. He explored applications in several industries and discussed technologies that allow the deployment of advanced visualization solutions to the cloud.
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, provided a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to oper...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...