Click here to close now.

Welcome!

Cloud Expo Authors: Elizabeth White, Lori MacVittie, Liz McMillan, Roger Strukhoff, Ian Khan

Related Topics: Cloud Expo, MICROSERVICES

Cloud Expo: Article

A Cloud Computing Business Intelligence Organization

Moving data warehouses to the cloud

Data Warehousing As A Cloud Candidate
Over the past  year, we have started seeing greater support for Cloud from major vendors and Cloud is here to stay. The   bigger impact is that,  the path is clearly drawn for the enterprises to adopt Cloud.  With this in mind,  it is time to identify the potential for existing  data center applications  to be migrated to Cloud.

Most of the major IT majors predict a  HYBRID Delivery will be future,  where by the future enterprises needs to look  for a delivery model that comprises of certain work loads  on  Clouds  and some of them continue to be on data centers and then look for a model that will integrate  them together.

Before  we go further into  a  blue print of  How Data warehouses  fit  within a HYBRID Cloud environment, we  will see the salient features of  Data warehouses  and how the Cloud tenants  make them a  very viable work load to be  moved to Cloud.

A data warehouse  is a subject oriented, integrated, time variant and non volatile collection  of data in support of management's decision making process.

Data Warehousing  Usage

Cloud Tenant Value Proposition

ETL   (Extract, Cleaning, Transform, Load) process is  subject to variable patterns. Normally we may get large files over the week end or in night time to be processed and loaded.

It is better to  use the COMPUTE  resources on demand for the ETL as they require , rather than having a fixed capacity

OLAP (Online Analytical Processing) and related  processing needs for  MOLAP (Multi dimensional OLAP) and / or ROLAP (Relational OLAP) are highly compute intensive and requires stronger processing needs

High Performance Computing and ability to scale up on demand,   tenants of Cloud will be highly aligned to this need

Physical architecture needs are complex in a data warehousing environment.

  • MPP Servers (Massively Parallel Processing)
  • Shared Nothing Data Architecture
  • Mirrored Copies of Disk Space
  • High Availability Clustering

Most of the IaaS , PaaS offerings  like Azure platform, Amazon EC2 have built in  provisions for a highly available architecture, with most of the day to day administration  is  abstracted from the  enterprises.

The below are some of the advantages of SQL Azure Platform

  • No physical administration required - software installation and patching is included, as this is a platform as a service (PAAS)
  • High availability and fault tolerance are built in

Multiple Software and platform  needs,

  • Database Design Tools (STAR Schema Modeling)
  • ETL Tools
  • Data Cleansing Tools
  • OLAP Tools
  • Spatial Tools
  • Data Mining Tools
  • BI Reporting Tools

The product stack of  data warehousing  environment  is really huge and most organizations will normally find  it difficult to get into  a ideal list of software and platforms and tools  for their BI  platform. platform. SaaS for  applications like  data cleansing or address validation and PaaS for reporting like Microsoft SQL Azure reporting will be ideal to solve the tools and platform maze.

 

The following are the ideal steps for migrating  a  in-premise  data warehouse  system to a cloud platform, for the sake of case study , Microsoft  Windows Azure platform is chosen  as the target platform.

1. Create Initial Database / Allocate Storage / Migrate Data
The existing STAR Schema design  of the existing  data warehousing system can be migrated to  Cloud platform as it is.  And  migrating  to a  Relational database  platform like  SQL Azure should be straightforward. To migrate the data,   the initial  storage allocations of the existing  database on  the data center needs to be calculated and the same amount  Storage resources will be allocated on the Cloud.

You can store any amount of data, from kilobytes to terabytes, in SQL Azure. However, individual databases are limited to 10 GB in size. To create solutions that store more than 10 GB of data, you must partition large data sets across multiple databases and use parallel queries to access the data.

Once a high scalable database infrastructure is setup on SQL Azure platform , the following are some of the methods in which the data from the existing on-premise  data warehouses can be  moved to SQL Azure.

Traditional BCP Tool : BCP  is a command line utility that ships with Microsoft SQL Server. It bulk copies data between SQL Azure (or SQL Server) and a data file in a user-specified format. The bcp utility that ships with SQL Server 2008 R2 is fully supported by SQL Azure. You can use BCP to backup and restore your data on SQL Azure You can import large numbers of new rows into SQL Azure tables or export data out of tables into data files by using the bcp utility.

The following tools  are also useful, if you existing  Data warehouse is in  Sql Server within the data center.

You can transfer data to SQL Azure by using SQL Server 2008 Integration Services (SSIS). SQL Server 2008 R2 or later supports the Import and Export Data Wizard and bulk copy for the transfer of data between an instance of Microsoft SQL Server and SQL Azure.

SQL Server Migration Assistant (SSMA for Access v4.2) supports migrating your schema and data from Microsoft Access to SQL Azure.

2. Set Up ETL & Integration With Existing  On Premise Data Sources
After the initial load of the  data warehouse on Cloud,  it required to be continuously refreshed   with the operational data.  This process  needs to extract  data from different data sources (such as flat files, legacy databases, RDBMS, ERP, CRM and SCM application packages).

This process will also carry out necessary transformations such as joining of tables, sorting, applying  various filters.

The following are typical  options available in Sql Azure platform  to  build a  ETL platform between the On Premise and data warehouse hosted on cloud.  The tools mentioned above on the initial load of the data also holds good  for ETL tool, however they are not repeated  to avoid duplication.

SQL Azure Data Sync :

  • Cloud to cloud synchronization
  • Enterprise (on-premise) to cloud
  • Cloud to on-premise.
  • Bi-directional or sync-to-hub or sync-from-hub synchronization

The following diagram courtesy  of  Vendor will give a over view of how the SQL Azure Data Sync can be used for ETL purposes.

Integration provides common  Biztalk  Server integration capabilities (e.g. pipeline, transforms, adapters) on Windows Azure, using out-of-box integration patterns to accelerate and simplify development. It also delivers higher level business user enablement capabilities such as Business Activity Monitoring and Rules, as well as self-service trading partner community portal and provisioning of business-to-business pipelines.  The following diagram courtesy of the vendor shows how the  Windows Azure Appfabric Integration can be used as a ETL platform.

3. Create CUBES & Other Analytics  Structures
The multi dimensional nature of  OLAP requires a analytical engine to process the underlying data and create a multi dimensional view and  the success of OLAP has resulted in a large  number of vendors  offering OLAP servers using different architectures.

MLOAP :  A Proprietary multidimensional database with a aim on performance.

ROLAP :   Relational OLAP is a technology that provides sophisticated multidimensional analysis that is performed on open relational databases.  ROLAP can scale to  large data sets in the terabyte range.

HOLAP : Hybrid OLAP is an attempt to combine some of the features of MOLAP and ROLAP technology.

SQL Azure Database does not support all of the features and data types found in SQL Server. Analysis Services, Replication, and Service Broker are not currently provided as services on the Windows Azure platform.

At this time  there is no direct support for OLAP and CUBE processing on SQL Azure,  however with the HPC (High Performance Computing ) attributes  using multiple Worker roles,  manually  aggregation of the data can be achieved.

4. Generate Reports
Reporting consists of  analyzing the data  stored in the data warehouse in multiple dimensions and  generate standard reports for business intelligence and also generate ad-hoc reports.  These reports present data in graphical/tabular form and also provide statistical analysis features.  These reports should be rendered as Excel, PDF and other formats.

It is better to utilize the SaaS based or PaaS based reporting infrastructure rather than custom coding all the reports.

SQL Azure Reporting enables developers to enhance their applications by embedding cloud based reports on information stored in a SQL Azure database.  Developers can author reports using familiar SQL Server Reporting Services tools and then use these reports in their applications which may be on-premises or in the cloud.

SQL Azure Reporting  also currently can connect only to SQL Azure databases.

Summary
The above steps will provide a path to migrate   on premise  Data warehousing  applications to Cloud. As we needed lot of support from the  vendor in terms of IaaS, PaaS  and SaaS,   Microsoft Azure Platform is chosen as a platform to support the case study.  With several features  integrated as part of  this, Microsoft  Cloud Platform  positioned to be  one of the leading platform for BI on Cloud.

The following diagram  indicates a blue print of a  typical Cloud BI Organization on a Microsoft Azure Platform.

More Stories By Srinivasan Sundara Rajan

Srinivasan is passionate about ownership and driving things on his own, with his breadth and depth on Enterprise Technology he could run any aspect of IT Industry and make it a success.

He is a seasoned Enterprise IT Expert, mainly in the areas of Solution, Integration and Architecture, across Structured, Unstructured data sources, especially in manufacturing domain.

He currently works as Technology Head For GAVS Technologies.

@CloudExpo Stories
SYS-CON Events announced today that the DevOps Institute has been named “Association Sponsor” of SYS-CON's DevOps Summit, which will take place on June 9–11, 2015, at the Javits Center in New York City, NY. The DevOps Institute provides enterprise level training and certification. Working with thought leaders from the DevOps community, the IT Service Management field and the IT training market, the DevOps Institute is setting the standard in quality for DevOps education and training.
WHOA.com has announced the newest addition to its data center footprint with the expansion into Equinix's newest state-of-the-art facility: DC-11 Washington, DC IBX+. Located in Ashburn, VA, this data center expands Whoa.com's presence to meet rapidly expanding customer demand for secure cloud solutions. Equinix, Inc. operates International Business Exchange™ (IBX®) data centers in 32 markets across 15 countries in the Americas, EMEA, and Asia-Pacific. Equinix is committed to operating faciliti...
SYS-CON Events announced today that FierceDevOps will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. FierceDevOps keeps software developers and IT operations personnel updated on the latest news and trends around the rapidly evolving role of the traditional IT worker.
SYS-CON Events announced today that Creative Business Solutions will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Creative Business Solutions is the top stocking authorized HP Renew Distributor in the U.S. Based out of Long Island, NY, Creative Business Solutions offers a one-stop shop for a diverse range of products including Proliant, Blade and Industry Standard Servers, Networking, Server Options and...
SYS-CON Events announced today that robomq.io will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. robomq.io is an interoperable and composable platform that connects any device to any application. It helps systems integrators and the solution providers build new and innovative products and service for industries requiring monitoring or intelligence from devices and sensors.
Today, IT is not just a cost center. IT is an enabler and driver of business. With the emergence of the hybrid cloud paradigm, IT now has increasingly more capabilities to create new strategic opportunities for a business. Hybrid cloud allows an organization to utilize multi-tenant public clouds, dedicated private clouds, bare metal hosting, and the associated support and services for the right use cases through an on-demand, XaaS model. This model of IT creates tremendous opportunities for busi...
Business as usual for IT is evolving into a “Make or Buy” decision on a service-by-service conversation with input from the LOBs. How does your organization move forward with cloud? In his general session at 16th Cloud Expo, Paul Maravei, Regional Sales Manager, Hybrid Cloud and Managed Services at Cisco, discusses how Cisco and its partners offer a market-leading portfolio and ecosystem of cloud infrastructure and application services that allow you to uniquely and securely combine cloud busi...
Businesses are looking to empower employees and departments to do more, go faster, and streamline their processes. For all workers – but mobile workers especially – utilizing the cloud to reconnect documents and improve processes without destructing existing workflows can have a dramatic impact on productivity. In his session at 16th Cloud Expo, Mark Grilli, vice president of Acrobat Solutions marketing at Adobe Systems Incorporated, will outline new ways that the cloud is changing the way peo...
Internet of Things (IoT) will be a hybrid ecosystem of diverse devices and sensors collaborating with operational and enterprise systems to create the next big application. In their session at @ThingsExpo, Bramh Gupta, founder and CEO of robomq.io, and Fred Yatzeck, principal architect leading product development at robomq.io, will discuss how choosing the right middleware and integration strategy from the get-go will enable IoT solution developers to adapt and grow with the industry, while at...
One of the hottest areas in cloud right now is DRaaS and related offerings. In his session at 16th Cloud Expo, Dale Levesque, Disaster Recovery Product Manager with Windstream's Cloud and Data Center Marketing team, will discuss the benefits of the cloud model, which far outweigh the traditional approach, and how enterprises need to ensure that their needs are properly being met.
SYS-CON Events announced today that MangoApps will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY., and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. MangoApps provides private all-in-one social intranets allowing workers to securely collaborate from anywhere in the world and from any device. Social, mobile, and eas...
With the arrival of the Big Data revolution, a data professional is expected to master a broad spectrum of complex domains including data processing, mathematics, programming languages, machine learning techniques, and business knowledge. While this mastery is undoubtedly important, this narrow focus on tool usage has divorced many from the imagination required to solve real-world problems. As the demand for analysis increases, the data science community must transform from tool experts to "data...
SYS-CON Events announced today that Solgenia will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Solgenia is the global market leader in Cloud Collaboration and Cloud Infrastructure software solutions. Designed to “Bridge the Gap” between Personal and Professional S...
WSM International has launched a DevOps services division that offers assessment, consulting and implementation to large enterprises and organizations with complex infrastructures. The concept of DevOps is to blend information technology (IT) software development with operations to optimize the computing infrastructure according to the specific needs of the organization. According to a recent press release from Gartner, "By 2016, DevOps will evolve from a niche strategy employed by large cloud ...
SYS-CON Events announced today that QTS Realty Trust, one of the nation’s largest and fastest-growing providers of data center facilities and cloud services and a leader in security and compliance, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. QTS Realty Trust, Inc. (NYSE: QTS) is a leading national provider of data center solutions and fully managed services, and a leader in security and compliance...
SYS-CON Events announced today that WSM International (WSM), the world’s leading cloud and server migration services provider, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. WSM is a solutions integrator with a core focus on cloud and server migration, transformation and DevOps services.
Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.
The speed of software changes in growing and large scale rapid-paced DevOps environments presents a challenge for continuous testing. Many organizations struggle to get this right. Practices that work for small scale continuous testing may not be sufficient as the requirements grow. In his session at DevOps Summit, Marc Hornbeek, Sr. Solutions Architect of DevOps continuous test solutions at Spirent Communications, will explain the best practices of continuous testing at high scale, which is r...
SYS-CON Events announced today that Emcien will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Emcien’s vision is to let anyone use data to know the future. Emcien has built an automated, predictive analysis product that improves the lives of real people. Emcien allows people to automate their data analysis so they can build a better future.
Hosted PaaS providers have given independent developers and startups huge advantages in efficiency and reduced time-to-market over their more process-bound counterparts in enterprises. Software frameworks are now available that allow enterprise IT departments to provide these same advantages for developers in their own organization. In his workshop session at DevOps Summit, Troy Topnik, ActiveState’s Technical Product Manager, will show how on-prem or cloud-hosted Private PaaS can enable organ...