Welcome!

@CloudExpo Authors: Liz McMillan, Zakia Bouachraoui, Yeshim Deniz, Pat Romanski, Elizabeth White

Related Topics: @CloudExpo

@CloudExpo: Blog Post

Cloud Analytics Checklist

What are enterprise users looking for from a cloud analytics solution?

Cloud Data Analytics on Ulitzer

In the previous article we looked at how realtime cloud analytics looks set to disrupt the $25B SQL/OLAP sector of the IT industry. What are users looking for from a next-generation post-SQL/OLAP enterprise analytics solution? Let's look at the requirements:

  • Realtime + Historical Data. In addition to analyzing (historical) data held in databases (Oracle, SQLServer, DB2, MySQL) or datastores (Hadoop, Amazon Elastic MapReduce), a next-gen analytics solution needs to be able to analyze, filter and transform live data streams in realtime, with low latency, and to be able to "push" just the right data, at the right time, to users throughout the enterprise. With SQL/OLAP or Hadoop/MapReduce, users "pull" historical data via queries or programs to find what they need, but for many analytics scenarios today what's needed instead, to handle information overload is a continuous "realtime push" model where "the data finds the user".

  • External + Internal Data. In the past it was so simple, an enterprise had only to deploy a few large specialized systems (ERP, CRM, Supply Chain, Web Analytics) to handle the internal data flowing through the organization. Today, in order to be able to operate with peak efficiency, a large enterprise will need to have a detailed realtime integrated awareness of all kinds of data sources that could impact the business, for example, information on: customers, partners, employees, competitors, marketing, advertising, pricing, web, news, markets, locations, gov data, communications, email, collaboration, social, IT, datacenters, networks, sensors.
  • Unstructured + Structured Data. SQL/OLAP analytics was built on the idea that data would be held in relational databases, and that the data would be highly structured. Today, this no longer applies. Much of the most valuable data to an enterprise today is either semi-structured or unstructured.
  • Easy-To-Use. SQL/OLAP has proved to be too complex for most enterprise users who need access to analytics for their work. Excel with its simple charting, visualization, sharing and collaboration features provides a much more attractive interface for most users. Other products and services such as Qlikview and GoodData also provide ease-of-use, but none of them (Excel included) offers the kind of realtime analytics, scalability and parallel processing required in analytics today. Despite its complexity and lack of mainstream adoption within the enterprise, a few companies have taken SQL/OLAP and made it even more complex by adding in features to support realtime stream processing. None of these StreamSQL solutions seem to have achieved any widespread adoption to date.
  • Cloud-Based, Pay-Per-Use. Every company looking to compete in the next-generation analytics market will have to have at least a public cloud offering, and most will also have virtual private cloud and private cloud offerings. Since enterprise data will often be held on more than one cloud, it will be increasingly important to have an "intercloud" capability, where analytics apps can be run simultaneously across multiple (public and/or private) clouds, e.g. across Amazon AWS and Windows Azure.
  • Elastic Scalability, Parallel Processing, MapReduce. With exponentially growing data volumes it will be essential to offer the elastic scalability and parallel processing required required to handle anything from one-off personal data analysis tasks up to the most demanding large-scale analytics apps required by the world's leading organizations in business, web, finance and government.
  • Seamless Integration With Standard Tools (Excel). With 40 Million analytics power users using Excel, this is a must for any analytics solution looking to achieve significant market adoption.

At Cloudscale, we've compiled a Cloud Analytics Checklist, showing how various analytics products/services measure up against this set of requirements. If you're thinking about cloud analytics and would like a copy of the Checklist then send a request with your email address via the Cloudscale website (no signup required) or by email to [email protected], with the word Checklist in the Subject line.

More Stories By Bill McColl

Bill McColl left Oxford University to found Cloudscale. At Oxford he was Professor of Computer Science, Head of the Parallel Computing Research Center, and Chairman of the Computer Science Faculty. Along with Les Valiant of Harvard, he developed the BSP approach to parallel programming. He has led research, product, and business teams, in a number of areas: massively parallel algorithms and architectures, parallel programming languages and tools, datacenter virtualization, realtime stream processing, big data analytics, and cloud computing. He lives in Palo Alto, CA.

CloudEXPO Stories
With more than 30 Kubernetes solutions in the marketplace, it's tempting to think Kubernetes and the vendor ecosystem has solved the problem of operationalizing containers at scale or of automatically managing the elasticity of the underlying infrastructure that these solutions need to be truly scalable. Far from it. There are at least six major pain points that companies experience when they try to deploy and run Kubernetes in their complex environments. In this presentation, the speaker will detail these pain points and explain how cloud can address them.
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-centric compute for the most data-intensive applications. Hyperconverged systems already in place can be revitalized with vendor-agnostic, PCIe-deployed, disaggregated approach to composable, maximizing the value of previous investments.
When building large, cloud-based applications that operate at a high scale, it's important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. "Fly two mistakes high" is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, discussed how this same philosophy can be applied to highly scaled applications, and can dramatically increase your resilience to failure.
Machine learning has taken residence at our cities' cores and now we can finally have "smart cities." Cities are a collection of buildings made to provide the structure and safety necessary for people to function, create and survive. Buildings are a pool of ever-changing performance data from large automated systems such as heating and cooling to the people that live and work within them. Through machine learning, buildings can optimize performance, reduce costs, and improve occupant comfort by sharing information within the building and with outside city infrastructure via real time shared cloud capabilities.
As Cybric's Chief Technology Officer, Mike D. Kail is responsible for the strategic vision and technical direction of the platform. Prior to founding Cybric, Mike was Yahoo's CIO and SVP of Infrastructure, where he led the IT and Data Center functions for the company. He has more than 24 years of IT Operations experience with a focus on highly-scalable architectures.