Welcome!

@CloudExpo Authors: Liz McMillan, Elizabeth White, ManageEngine IT Matters, Kong Yang, Pat Romanski

Related Topics: @CloudExpo, Microservices Expo

@CloudExpo: Article

A Cloud Computing Business Intelligence Organization

Moving data warehouses to the cloud

Data Warehousing As A Cloud Candidate
Over the past  year, we have started seeing greater support for Cloud from major vendors and Cloud is here to stay. The   bigger impact is that,  the path is clearly drawn for the enterprises to adopt Cloud.  With this in mind,  it is time to identify the potential for existing  data center applications  to be migrated to Cloud.

Most of the major IT majors predict a  HYBRID Delivery will be future,  where by the future enterprises needs to look  for a delivery model that comprises of certain work loads  on  Clouds  and some of them continue to be on data centers and then look for a model that will integrate  them together.

Before  we go further into  a  blue print of  How Data warehouses  fit  within a HYBRID Cloud environment, we  will see the salient features of  Data warehouses  and how the Cloud tenants  make them a  very viable work load to be  moved to Cloud.

A data warehouse  is a subject oriented, integrated, time variant and non volatile collection  of data in support of management's decision making process.

Data Warehousing  Usage

Cloud Tenant Value Proposition

ETL   (Extract, Cleaning, Transform, Load) process is  subject to variable patterns. Normally we may get large files over the week end or in night time to be processed and loaded.

It is better to  use the COMPUTE  resources on demand for the ETL as they require , rather than having a fixed capacity

OLAP (Online Analytical Processing) and related  processing needs for  MOLAP (Multi dimensional OLAP) and / or ROLAP (Relational OLAP) are highly compute intensive and requires stronger processing needs

High Performance Computing and ability to scale up on demand,   tenants of Cloud will be highly aligned to this need

Physical architecture needs are complex in a data warehousing environment.

  • MPP Servers (Massively Parallel Processing)
  • Shared Nothing Data Architecture
  • Mirrored Copies of Disk Space
  • High Availability Clustering

Most of the IaaS , PaaS offerings  like Azure platform, Amazon EC2 have built in  provisions for a highly available architecture, with most of the day to day administration  is  abstracted from the  enterprises.

The below are some of the advantages of SQL Azure Platform

  • No physical administration required - software installation and patching is included, as this is a platform as a service (PAAS)
  • High availability and fault tolerance are built in

Multiple Software and platform  needs,

  • Database Design Tools (STAR Schema Modeling)
  • ETL Tools
  • Data Cleansing Tools
  • OLAP Tools
  • Spatial Tools
  • Data Mining Tools
  • BI Reporting Tools

The product stack of  data warehousing  environment  is really huge and most organizations will normally find  it difficult to get into  a ideal list of software and platforms and tools  for their BI  platform. platform. SaaS for  applications like  data cleansing or address validation and PaaS for reporting like Microsoft SQL Azure reporting will be ideal to solve the tools and platform maze.

 

The following are the ideal steps for migrating  a  in-premise  data warehouse  system to a cloud platform, for the sake of case study , Microsoft  Windows Azure platform is chosen  as the target platform.

1. Create Initial Database / Allocate Storage / Migrate Data
The existing STAR Schema design  of the existing  data warehousing system can be migrated to  Cloud platform as it is.  And  migrating  to a  Relational database  platform like  SQL Azure should be straightforward. To migrate the data,   the initial  storage allocations of the existing  database on  the data center needs to be calculated and the same amount  Storage resources will be allocated on the Cloud.

You can store any amount of data, from kilobytes to terabytes, in SQL Azure. However, individual databases are limited to 10 GB in size. To create solutions that store more than 10 GB of data, you must partition large data sets across multiple databases and use parallel queries to access the data.

Once a high scalable database infrastructure is setup on SQL Azure platform , the following are some of the methods in which the data from the existing on-premise  data warehouses can be  moved to SQL Azure.

Traditional BCP Tool : BCP  is a command line utility that ships with Microsoft SQL Server. It bulk copies data between SQL Azure (or SQL Server) and a data file in a user-specified format. The bcp utility that ships with SQL Server 2008 R2 is fully supported by SQL Azure. You can use BCP to backup and restore your data on SQL Azure You can import large numbers of new rows into SQL Azure tables or export data out of tables into data files by using the bcp utility.

The following tools  are also useful, if you existing  Data warehouse is in  Sql Server within the data center.

You can transfer data to SQL Azure by using SQL Server 2008 Integration Services (SSIS). SQL Server 2008 R2 or later supports the Import and Export Data Wizard and bulk copy for the transfer of data between an instance of Microsoft SQL Server and SQL Azure.

SQL Server Migration Assistant (SSMA for Access v4.2) supports migrating your schema and data from Microsoft Access to SQL Azure.

2. Set Up ETL & Integration With Existing  On Premise Data Sources
After the initial load of the  data warehouse on Cloud,  it required to be continuously refreshed   with the operational data.  This process  needs to extract  data from different data sources (such as flat files, legacy databases, RDBMS, ERP, CRM and SCM application packages).

This process will also carry out necessary transformations such as joining of tables, sorting, applying  various filters.

The following are typical  options available in Sql Azure platform  to  build a  ETL platform between the On Premise and data warehouse hosted on cloud.  The tools mentioned above on the initial load of the data also holds good  for ETL tool, however they are not repeated  to avoid duplication.

SQL Azure Data Sync :

  • Cloud to cloud synchronization
  • Enterprise (on-premise) to cloud
  • Cloud to on-premise.
  • Bi-directional or sync-to-hub or sync-from-hub synchronization

The following diagram courtesy  of  Vendor will give a over view of how the SQL Azure Data Sync can be used for ETL purposes.

Integration provides common  Biztalk  Server integration capabilities (e.g. pipeline, transforms, adapters) on Windows Azure, using out-of-box integration patterns to accelerate and simplify development. It also delivers higher level business user enablement capabilities such as Business Activity Monitoring and Rules, as well as self-service trading partner community portal and provisioning of business-to-business pipelines.  The following diagram courtesy of the vendor shows how the  Windows Azure Appfabric Integration can be used as a ETL platform.

3. Create CUBES & Other Analytics  Structures
The multi dimensional nature of  OLAP requires a analytical engine to process the underlying data and create a multi dimensional view and  the success of OLAP has resulted in a large  number of vendors  offering OLAP servers using different architectures.

MLOAP :  A Proprietary multidimensional database with a aim on performance.

ROLAP :   Relational OLAP is a technology that provides sophisticated multidimensional analysis that is performed on open relational databases.  ROLAP can scale to  large data sets in the terabyte range.

HOLAP : Hybrid OLAP is an attempt to combine some of the features of MOLAP and ROLAP technology.

SQL Azure Database does not support all of the features and data types found in SQL Server. Analysis Services, Replication, and Service Broker are not currently provided as services on the Windows Azure platform.

At this time  there is no direct support for OLAP and CUBE processing on SQL Azure,  however with the HPC (High Performance Computing ) attributes  using multiple Worker roles,  manually  aggregation of the data can be achieved.

4. Generate Reports
Reporting consists of  analyzing the data  stored in the data warehouse in multiple dimensions and  generate standard reports for business intelligence and also generate ad-hoc reports.  These reports present data in graphical/tabular form and also provide statistical analysis features.  These reports should be rendered as Excel, PDF and other formats.

It is better to utilize the SaaS based or PaaS based reporting infrastructure rather than custom coding all the reports.

SQL Azure Reporting enables developers to enhance their applications by embedding cloud based reports on information stored in a SQL Azure database.  Developers can author reports using familiar SQL Server Reporting Services tools and then use these reports in their applications which may be on-premises or in the cloud.

SQL Azure Reporting  also currently can connect only to SQL Azure databases.

Summary
The above steps will provide a path to migrate   on premise  Data warehousing  applications to Cloud. As we needed lot of support from the  vendor in terms of IaaS, PaaS  and SaaS,   Microsoft Azure Platform is chosen as a platform to support the case study.  With several features  integrated as part of  this, Microsoft  Cloud Platform  positioned to be  one of the leading platform for BI on Cloud.

The following diagram  indicates a blue print of a  typical Cloud BI Organization on a Microsoft Azure Platform.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

@CloudExpo Stories
Cloud applications are seeing a deluge of requests to support the exploding advanced analytics market. “Open analytics” is the emerging strategy to deliver that data through an open data access layer, in the cloud, to be directly consumed by external analytics tools and popular programming languages. An increasing number of data engineers and data scientists use a variety of platforms and advanced analytics languages such as SAS, R, Python and Java, as well as frameworks such as Hadoop and Spark...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
Both SaaS vendors and SaaS buyers are going “all-in” to hyperscale IaaS platforms such as AWS, which is disrupting the SaaS value proposition. Why should the enterprise SaaS consumer pay for the SaaS service if their data is resident in adjacent AWS S3 buckets? If both SaaS sellers and buyers are using the same cloud tools, automation and pay-per-transaction model offered by IaaS platforms, then why not host the “shrink-wrapped” software in the customers’ cloud? Further, serverless computing, cl...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 21st Int\ernational Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their ...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
In his session at @ThingsExpo, Eric Lachapelle, CEO of the Professional Evaluation and Certification Board (PECB), provided an overview of various initiatives to certify the security of connected devices and future trends in ensuring public trust of IoT. Eric Lachapelle is the Chief Executive Officer of the Professional Evaluation and Certification Board (PECB), an international certification body. His role is to help companies and individuals to achieve professional, accredited and worldwide re...
Wooed by the promise of faster innovation, lower TCO, and greater agility, businesses of every shape and size have embraced the cloud at every layer of the IT stack – from apps to file sharing to infrastructure. The typical organization currently uses more than a dozen sanctioned cloud apps and will shift more than half of all workloads to the cloud by 2018. Such cloud investments have delivered measurable benefits. But they’ve also resulted in some unintended side-effects: complexity and risk. ...
It is ironic, but perhaps not unexpected, that many organizations who want the benefits of using an Agile approach to deliver software use a waterfall approach to adopting Agile practices: they form plans, they set milestones, and they measure progress by how many teams they have engaged. Old habits die hard, but like most waterfall software projects, most waterfall-style Agile adoption efforts fail to produce the results desired. The problem is that to get the results they want, they have to ch...
In 2014, Amazon announced a new form of compute called Lambda. We didn't know it at the time, but this represented a fundamental shift in what we expect from cloud computing. Now, all of the major cloud computing vendors want to take part in this disruptive technology. In his session at 20th Cloud Expo, Doug Vanderweide, an instructor at Linux Academy, discussed why major players like AWS, Microsoft Azure, IBM Bluemix, and Google Cloud Platform are all trying to sidestep VMs and containers wit...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists examined how DevOps helps to meet the de...
When growing capacity and power in the data center, the architectural trade-offs between server scale-up vs. scale-out continue to be debated. Both approaches are valid: scale-out adds multiple, smaller servers running in a distributed computing model, while scale-up adds fewer, more powerful servers that are capable of running larger workloads. It’s worth noting that there are additional, unique advantages that scale-up architectures offer. One big advantage is large memory and compute capacity...
The taxi industry never saw Uber coming. Startups are a threat to incumbents like never before, and a major enabler for startups is that they are instantly “cloud ready.” If innovation moves at the pace of IT, then your company is in trouble. Why? Because your data center will not keep up with frenetic pace AWS, Microsoft and Google are rolling out new capabilities. In his session at 20th Cloud Expo, Don Browning, VP of Cloud Architecture at Turner, posited that disruption is inevitable for comp...
No hype cycles or predictions of zillions of things here. IoT is big. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, Associate Partner at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He discussed the evaluation of communication standards and IoT messaging protocols, data analytics considerations, edge-to-cloud tec...
"When we talk about cloud without compromise what we're talking about is that when people think about 'I need the flexibility of the cloud' - it's the ability to create applications and run them in a cloud environment that's far more flexible,” explained Matthew Finnie, CTO of Interoute, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Cloud applications are seeing a deluge of requests to support the exploding advanced analytics market. “Open analytics” is the emerging strategy to deliver that data through an open data access layer, in the cloud, to be directly consumed by external analytics tools and popular programming languages. An increasing number of data engineers and data scientists use a variety of platforms and advanced analytics languages such as SAS, R, Python and Java, as well as frameworks such as Hadoop and Spark...
"We are a monitoring company. We work with Salesforce, BBC, and quite a few other big logos. We basically provide monitoring for them, structure for their cloud services and we fit into the DevOps world" explained David Gildeh, Co-founder and CEO of Outlyer, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Silicon India has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Published in Silicon Valley, Silicon India magazine is the premiere platform for CIOs to discuss their innovative enterprise solutions and allows IT vendors to learn about new solutions that can help grow their business.
Join us at Cloud Expo June 6-8 to find out how to securely connect your cloud app to any cloud or on-premises data source – without complex firewall changes. More users are demanding access to on-premises data from their cloud applications. It’s no longer a “nice-to-have” but an important differentiator that drives competitive advantages. It’s the new “must have” in the hybrid era. Users want capabilities that give them a unified view of the data to get closer to customers and grow business. The...