Welcome!

Cloud Expo Authors: Liz McMillan, Elizabeth White, Yeshim Deniz, Hovhannes Avoyan, Roger Strukhoff

Related Topics: SYS-CON ITALIA, Virtualization

SYS-CON ITALIA: Article

Cloud Computing: Making Analytics in the Cloud a Reality

There will soon be a myriad of announcements of DBMS offerings in the cloud

There will soon be a myriad of announcements of DBMS offerings in the cloud. Many of these will NOT be marriages made in heaven. However, the most innovative new DBMS software combined with new cloud computing services are here today and truly take advantage of the cloud architecture in order to change the economics and the responsiveness of business analytics.

My belief is that cloud computing will change the economics of business intelligence (BI) and enable a variety of new analytic data management projects and business possibilities. It does so by making the hardware, networking, security, and software needed to create data marts and data warehouses available on demand with a pay-as-you-go approach to usage and licensing.

A computing cloud, such as the Amazon Elastic Compute Cloud, is composed of thousands of commodity servers running multiple virtual machine instances (VMs) of the applications hosted in the cloud. As customer demand for those applications changes, new servers are added to the cloud or idled and new VMs are instantiated or terminated.

Cloud computing infrastructure differs dramatically from the infrastructure underlying most in-house data warehouses and data marts. There are no high-end servers with dozens of CPU cores, SANs, replicated systems, or proprietary data warehousing appliances available in the cloud. Therefore, a new DBMS software architecture is required to enable large volumes of data to be analyzed quickly and reliably on the cloud's commodity hardware. Recent DBMS innovations make this a reality today, and the best cloud DBMS architectures will include:

  1. Shared-nothing, massively parallel processing (MPP) architecture. In order to drive down the cost of creating a utility computing environment, the best cloud service providers use huge grids of identical (or similar) computing elements. Each node in the grid is typically a compute engine with its own attached storage. For a cloud database to successfully "scale out" in such an environment, it is essential that the database have a shared-nothing architecture utilizing the resources (CPU, memory, and disk) found in server nodes added to the cluster. Most databases popularly used in BI today have shared-everything or shared-storage architectures, which will limit their ability to scale in the cloud.

  2. Automatic high availability. Within a cloud-based analytic database cluster, node failures, node changes, and connection disruptions can occur. Given the vast number of processing elements within a cloud, these failures can be made transparent to the end user if the database has the proper built-in failover capabilities. The best cloud databases will replicate data automatically across the nodes in the cloud cluster, be able to continue running in the event of 1 or more node failures ("k-safety"), and be capable of restoring data on recovered nodes automatically -- without DBA assistance. Ideally, the replicated data will be made "active" in different sort orders for querying to increase performance.

  3. Ultra-high performance. One of the game-changing advantages of the cloud is the ability to get an analytic application up quickly (without waiting for hardware procurement). However, there can be some performance penalty due to Internet connectivity speeds and the virtualized cloud environment. If the analytic performance is disappointing, the advantage is lost. Fortunately, the latest shared-nothing columnar databases are designed specifically for analytic workloads, and they have demonstrated dramatic performance improvements over traditional, row-oriented databases (as verified by industry experts, such as Gartner and Forrester, and by customer benchmarks). This software performance improvement, coupled with the hardware economies of scale provided by the cloud environment, results in a new economic model and competitive advantage for cloud analytics.

  4. Aggressive compression. Since cloud costs are typically driven by charges for processor and disk storage utilization, aggressive data compression will result in very large cost savings. Row-oriented databases can achieve compression factors of about 30% to 50%; however, the addition of necessary indexes and materialized views often swells databases to 2 to 5 times the size of the source data. But since the data in a column tends to be more similar and repetitive than attributes within rows, column databases often achieve much higher levels of compression. They also don't require indexes. The result is normally a 4x to 20x reduction in the amount of storage needed by columnar databases and a commensurate reduction in storage costs.

  5. Standards-based connectivity. While there are a number of special-purpose file systems that have been developed for the cloud environment that can provide high performance, they lack the standard connectivity needed to support general-purpose business analytics. The broad base of analytic users will use existing commercial ETL and reporting software that depend on SQL, JDBC, ODBC, and other DBMS connectivity standards to load and query cloud databases. Therefore, it's imperative for cloud databases to support these connection standards to enable widespread use of analytic applications.
In summary, cloud databases with the architectural characteristics described above will be able to not just run in the cloud, but thrive there by:

  • "Scaling out," as the cloud itself does
  • Running fast without high-end or custom hardware
  • Providing high availability in a fluid computing environment
  • Minimizing data storage, transfer, and CPU utilization (to keep cloud computing fees low)

More Stories By Jerry Held

Jerry Held is Executive Chairman of Vertica and CEO of the Held Consulting Group, a firm that provides strategic consulting to CEOs and senior executives of technology firms ranging from startups to very large organizations and private equity firms. Prior to his current position, Held was a senior executive at both Oracle Corp. and Tandem Computers.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Cloud Expo Latest Stories
14th International Cloud Expo, held on June 10–12, 2014 at the Javits Center in New York City, featured three content-packed days with a rich array of sessions about the business and technical value of cloud computing, Internet of Things, Big Data, and DevOps led by exceptional speakers from every sector of the IT ecosystem. The Cloud Expo series is the fastest-growing Enterprise IT event in the past 10 years, devoted to every aspect of delivering massively scalable enterprise IT as a service.
As more applications and services move "to the cloud" (public or on-premise) cloud environments are increasingly adopting and building out traditional enterprise features. This in turn is enabling and encouraging cloud adoption from enterprise users. In many ways the definition is blurring as features like continuous operation, geo-distribution or on-demand capacity become the norm. NuoDB is involved in both building enterprise software and using enterprise cloud capabilities. In his session at 15th Cloud Expo, Seth Proctor, CTO at NuoDB, Inc., will discuss the experiences from building, deploying and using enterprise services and suggest some ways to approach moving enterprise applications into a cloud model.
Until recently, many organizations required specialized departments to perform mapping and geospatial analysis, and they used Esri on-premise solutions for that work. In his session at 15th Cloud Expo, Dave Peters, author of the Esri Press book Building a GIS, System Architecture Design Strategies for Managers, will discuss how Esri has successfully included the cloud as a fully integrated SaaS expansion of the ArcGIS mapping platform. Organizations that have incorporated Esri cloud-based applications and content within their business models are reaping huge benefits by directly leveraging cloud-based mapping and analysis capabilities within their existing enterprise investments. The ArcGIS mapping platform includes cloud-based content management and information resources to more widely, efficiently, and affordably deliver real-time actionable information and analysis capabilities to your organization.
In his session at 15th Cloud Expo, Mark Hinkle, Senior Director, Open Source Solutions at Citrix Systems Inc., will provide overview of the open source software that can be used to deploy and manage a cloud computing environment. He will include information on storage, networking(e.g., OpenDaylight) and compute virtualization (Xen, KVM, LXC) and the orchestration(Apache CloudStack, OpenStack) of the three to build their own cloud services. Speaker Bio: Mark Hinkle is the Senior Director, Open Source Solutions, at Citrix Systems Inc. He joined Citrix as a result of their July 2011 acquisition of Cloud.com where he was their Vice President of Community. He is currently responsible for Citrix open source efforts around the open source cloud computing platform, Apache CloudStack and the Xen Hypervisor. Previously he was the VP of Community at Zenoss Inc., a producer of the open source application, server, and network management software, where he grew the Zenoss Core project to over 10...
Almost everyone sees the potential of Internet of Things but how can businesses truly unlock that potential. The key will be in the ability to discover business insight in the midst of an ocean of Big Data generated from billions of embedded devices via Systems of Discover. Businesses will also need to ensure that they can sustain that insight by leveraging the cloud for global reach, scale and elasticity. In his session at Internet of @ThingsExpo, Mac Devine, Distinguished Engineer at IBM, will discuss bringing these three elements together via Systems of Discover.
Cloud and Big Data present unique dilemmas: embracing the benefits of these new technologies while maintaining the security of your organization’s assets. When an outside party owns, controls and manages your infrastructure and computational resources, how can you be assured that sensitive data remains private and secure? How do you best protect data in mixed use cloud and big data infrastructure sets? Can you still satisfy the full range of reporting, compliance and regulatory requirements? In his session at 15th Cloud Expo, Derek Tumulak, Vice President of Product Management at Vormetric, will discuss how to address data security in cloud and Big Data environments so that your organization isn’t next week’s data breach headline.
The cloud is everywhere and growing, and with it SaaS has become an accepted means for software delivery. SaaS is more than just a technology, it is a thriving business model estimated to be worth around $53 billion dollars by 2015, according to IDC. The question is – how do you build and scale a profitable SaaS business model? In his session at 15th Cloud Expo, Jason Cumberland, Vice President, SaaS Solutions at Dimension Data, will give the audience an understanding of common mistakes businesses make when transitioning to SaaS; how to avoid them; and how to build a profitable and scalable SaaS business.
SYS-CON Events announced today that Gridstore™, the leader in software-defined storage (SDS) purpose-built for Windows Servers and Hyper-V, will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Gridstore™ is the leader in software-defined storage purpose built for virtualization that is designed to accelerate applications in virtualized environments. Using its patented Server-Side Virtual Controller™ Technology (SVCT) to eliminate the I/O blender effect and accelerate applications Gridstore delivers vmOptimized™ Storage that self-optimizes to each application or VM across both virtual and physical environments. Leveraging a grid architecture, Gridstore delivers the first end-to-end storage QoS to ensure the most important App or VM performance is never compromised. The storage grid, that uses Gridstore’s performance optimized nodes or capacity optimized nodes, starts with as few a...
SYS-CON Events announced today that Solgenia, the global market leader in Cloud Collaboration and Cloud Infrastructure software solutions, will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Solgenia is the global market leader in Cloud Collaboration and Cloud Infrastructure software solutions. Designed to “Bridge the Gap” between personal and professional social, mobile and cloud user experiences, our solutions help large and medium-sized organizations dramatically improve productivity, reduce collaboration costs, and increase the overall enterprise value by bringing collaboration and infrastructure solutions to the cloud.
Cloud computing started a technology revolution; now DevOps is driving that revolution forward. By enabling new approaches to service delivery, cloud and DevOps together are delivering even greater speed, agility, and efficiency. No wonder leading innovators are adopting DevOps and cloud together! In his session at DevOps Summit, Andi Mann, Vice President of Strategic Solutions at CA Technologies, will explore the synergies in these two approaches, with practical tips, techniques, research data, war stories, case studies, and recommendations.
Enterprises require the performance, agility and on-demand access of the public cloud, and the management, security and compatibility of the private cloud. The solution? In his session at 15th Cloud Expo, Simone Brunozzi, VP and Chief Technologist(global role) for VMware, will explore how to unlock the power of the hybrid cloud and the steps to get there. He'll discuss the challenges that conventional approaches to both public and private cloud computing, and outline the tough decisions that must be made to accelerate the journey to the hybrid cloud. As part of the transition, an Infrastructure-as-a-Service model will enable enterprise IT to build services beyond their data center while owning what gets moved, when to move it, and for how long. IT can then move forward on what matters most to the organization that it supports – availability, agility and efficiency.
Every healthy ecosystem is diverse. This is especially true in cloud ecosystems, where portability and interoperability are more important than old enterprise models of proprietary ownership. In his session at 15th Cloud Expo, Mark Baker, Server Product Manager at Canonical/Ubuntu, will discuss how single vendors used to take the lead in creating and delivering technology, but in a cloud economy, where users want tools of their preference, when and where they need them, it makes no sense.
The 15th International Cloud Expo has just expanded its conference program, to bring together Cloud Computing, APM, APIs, Security, Big Data, Internet of Things, DevOps and WebRTC at one location. Cloud Expo is the single show where delegates and technology vendors can meet to experience and discuss the entire world of the cloud. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to learn about the latest technology developments and solutions.
SYS-CON Events announced today that Bsquare Corporation, a leading enabler of smart connected systems, has been named “Bronze Sponsor” of SYS-CON's Internet of @ThingsExpo, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Bsquare is a global leader of embedded software solutions. We enable smart connected systems at the device level and beyond that millions use every day and provide actionable data solutions for the growing Internet of Things (IoT) market. We empower our world-class customers with our products, services and solutions to achieve innovation and success.
SYS-CON Events announced today that NuoDB, Inc., the leader in webscale distributed database technology, has been named “Bronze Sponsor” of SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. NuoDB was launched in 2010 by industry-renowned database architect Jim Starkey and accomplished software CEO Barry Morris to deliver a webscale distributed database management system that is specifically designed for the cloud and the modern datacenter.