|By Robert Eve||
|December 6, 2011 09:30 AM EST||
Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility is the first book published on the topic of data virtualization. Along with an overview of data virtualization and its advantages, it presents ten case studies of organizations that have adopted data virtualization to significantly improve business decision making, decrease time-to-solution and reduce costs. This article describes data virtualization adoption at one of the enterprises profiled, Pfizer Inc.
Pfizer Inc. is a biopharmaceutical company that develops, manufactures and markets medicines for both humans and animals. As the world's largest drug manufacturer, Pfizer operates globally with 111,500 employees and a presence in over 100 countries.
Worldwide Pharmaceutical Sciences (PharmSci) is a group of scientists responsible for enabling what drugs Pfizer will bring to market. This group designs, synthesizes and manufactures all drugs that are part of clinical trials and toxicology testing within Pfizer.
For this case study, we interviewed Dr. Michael C. Linhares, Ph.D and Research Fellow. Linhares heads up the Business Information Systems (BIS) team within PharmSci.
BIS is responsible for portfolio and resource management across all of PharmSci's projects. This involves designing, building and supporting systems that deliver data to executive teams and staff to help them make decisions regarding how to allocate available resources - both people and dollars - across the overall portfolio of over 100 projects annually.
The Business Problem
A major challenge for PharmSci is the fact that it has a complex portfolio of projects that is constantly changing.
According to Linhares, "Every week, something new comes up and we need to ensure that the right information is communicated to the right people. The people making decisions about resource allocation need easy and simple methods for obtaining that information. One aspect of this is that some people learn the information first and they need to communicate it to others who are responsible for making decisions based on the information. This creates an information-sharing challenge."
Linhares estimates that there are 80 to 100 information producers within PharmSci and over 1,000 information consumers, including the executives who seek a full picture of the project portfolio - financial data, project data, people data and data about the pharmaceutical compounds themselves.
The Technical Problem
The data required is created in and managed by different applications, each developed by a different team, stored in multiple sources managed by different technologies, and the applications don't talk to each other.
This makes it very difficult to access summary information across all projects. Examples would be identifying how much money is being spent on all projects in the project management system, what the next milestones are and when each will be met, and who is working on each project. "We needed a solution that would allow us to pull all this information together in an agile way."
When Linhares joined PharmSci, there was very little in the way of effective information integration. Most integration was done manually by exporting data from various systems into Excel spreadsheets and then either combining spreadsheets or taking the spreadsheet data and moving it into Access or SQL Server databases. With no real security controls, this approach also lacked scalability and opportunities for reuse, generated multiple copies of the spreadsheets (with various changes), and it often took weeks to build a spreadsheet with only a 50% chance that it would include all of the data required.
To be successful, the solution to these data integration and reporting problems had to provide the following:
- A single, integrated view of all data sources with a common set of naming conventions
- A flexible middle layer that would be independent of both the data sources on the back end and the reporting tools on the front end to facilitate easy change management
- Shared metadata and business rule functionality so there would be a single point for managing and monitoring the solution
- A development platform that supported fast, iterative development and, therefore, continuous process improvement
Three Options Considered
BIS considered three solution architectures to meet their business and technical challenges.
- Traditional Information Factory: The first option was a traditional approach of an integrated, scalable information factory. Pfizer had already implemented information factories in the division using a combination of Informatica ETL tools, Oracle databases and custom-built reporting applications. However, according to Linhares, an information factory "seemed like overkill. We didn't have high volumes of data, nor did we need the inherent complexity of using ETL tools to transform and move data while making sure we included all the detailed data we might possibly ever need over time." Furthermore, because of the way the information factories were managed within Pfizer, change management entailed significant overhead. However, the architectural concepts of an information factory were not going to be ignored in the final solution.
- Single Vendor Stack: A second possible approach was to implement the solution in a single integrated technology (SQL Server with integration services). Major disadvantages were the lack of access to multiple data source types, the need to move data multiple times and the lack of an integrated metadata repository for understanding and organizing the data model.
- Data Virtualization: The third option was to create a federated data virtualization layer that integrated and accessed the underlying data sources through virtual views of the data. By leaving the source data in place, this approach would eliminate the issues inherent in copying and moving all the data (which Linhares described as unnecessary, "non-value added" activities). With the right technology and mix of products, data virtualization would enable PharmSci to migrate from inefficient, off-line spreadmarts to online access to integrated information that could be rapidly tailored and reused to dramatically increase its value to the organization.
The Data Virtualization Solution - Architecture
Pfizer's solution is the PharmSci Portfolio Database (PSPD), a federated data delivery framework implemented with the Composite Data Virtualization Platform.
Data virtualization enables the integration of all PharmSci data sources into a single reporting schema of information that can be accessed by all front-end tools and users. The solution architecture includes the following components:
Trusted Data Sources: There are many sources of data for PSPD; they are geographically dispersed, store data in a variety of formats across a multivendor, heterogeneous data environment. Here are some examples:
- Enterprise Project Management (EPM) is a SQL Server database of WRD's drug portfolio project plans. It includes detailed project schedules and milestones.
- The Global Information Factory (GIF) is an Oracle-based data warehouse of monthly finance data.
- OneSource, a database of corporate-level drug portfolio information is itself a unified set of Composite views across several different sources built by another group within Pfizer.
- Flat files are provided by the Finance Department on actual resource use.
- SharePoint lists are small SharePoint databases accessed using a web service.
- There are other data sources as well, including custom-built systems. As Linhares pointed out, "It doesn't matter what data sources we have. With a virtual approach, we are not limited by the types of data we need to access."
Data Virtualization Layer: The Composite Data Virtualization Platform forms the data virtualization layer that enables the solution to be independent of the data sources and front-end tools. It provides abstracted access to all of the data sources and delivers the data through virtual views. These views effectively present the PharmSci Portfolio Database as subject-specific data marts. The Composite metadata repository manages data lineage and business rules.
Consuming Applications: The flexibility of the platform is demonstrated by the varied reporting applications that use the information in PSPD. Examples include:
- SAP Business Objects for ad hoc queries, standard reports and dashboards.
- TIBCO Spotfire for analytics and access to data through standard presentation reports.
- Web services for parameterized queries.
- Data services to provide data for downstream applications.
- QuickViews (web pages built using DevExpress, a .NET toolkit) for access to live data.
SharePoint Portal: Branded as "InfoSource," this team collaboration web portal is the front-end interface that provides integrated access to PSPD data for all PharmSci customers through the consuming applications described above.
The Data Virtualization Solution - Best Practices
Linhares and team applied a number of data virtualization best practices when implementing the architecture described above.
Two Layers of Abstraction: Linhares stressed the importance of building two clear levels of abstraction into the data virtualization architecture. The first level abstracts Sources (the information abstraction layer), the second consumers (the reporting abstraction layer).
"We built a representation of the data in Composite. If a source is ever changed by the owner, which often happens, we can update the representation in the information abstraction layer quickly. This allows control of all downstream data in one location."
The second level of abstraction is the one between the reporting schema and the front-end reporting tools. A consolidated and integrated set of information is exposed as a single schema. This allows BIS to be system agnostic and support the use of whatever tool is best for the customer. All of the reporting tools use the same reporting abstraction layer; they always get the same answer to the same question because there is only a single source of data.
Consolidated Business Rules: Another key piece of the solution is the ability to include the business rules about how PharmSci manages its data within these abstraction layers. The business rules are embedded in the view definitions and are applied consistently at the same point.
Rapid Application Development Process: Prior to data virtualization, data integration was the slowest step for BIS in fulfilling a customer request for information. Now it's typically the fastest. "For example, a request that came in Friday morning and was completed by that afternoon. The customer's response was an amazed, ‘What do you mean you already have it done?'"
BIS uses a simple development process. The first step is what Linhares calls "triage" - looking at what the customer wants, estimating how long it will take and communicating that to the customer.
BIS does not spend a lot of time documenting the requirements of the solution. Instead, the group first creates a prototype on paper in the form of a simple data flow, then creates the necessary virtual views, gives the customer web access to the views and asks: "Is this what you wanted?"
The customer can then play with the result and respond with any changes or additions needed. BIS arrives at the final solution working with the customer in an iterative process.
Summary of Benefits
Linhares described several major benefits of the data virtualization solution.
The ability to provide integrated data in context: Data virtualization has enabled BIS to replace isolated silos of data with a data delivery platform that integrates different types and sources of data into a comprehensive package of value-added information. Instead of only the team leader and a core group of eight to ten people knowing about a project, the entire organization has access to relevant project information.
The independence of the data virtualization layer: "This is one of the huge benefits of data virtualization. It allows me to manage and monitor everything in one place and it makes change management easy for BIS and transparent to users."
Fast, iterative development environment: The data delivery infrastructure already exists in the data virtualization layer (defined data sources, standard naming conventions, access methods, etc.) so when a request for information comes in, BIS can quickly put it together for the customer.
Elimination of manual effort throughout PharmSci: According to Linhares, people initially resisted going away from their spreadsheets. But once there was a single source for the data and it was all available through InfoSource, there was a dramatic reduction in the need to have meetings to reconcile spreadsheet data among teams.
• • •
Editor's Note: Robert Eve is the co-author, along with Judith R. Davis, of Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility, the first book published on the topic of data virtualization. The complete Pfizer case study, along with nine others enterprise are available in the book.
SYS-CON Events announced today that FierceDevOps will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. FierceDevOps keeps software developers and IT operations personnel updated on the latest news and trends around the rapidly evolving role of the traditional IT worker.
Mar. 30, 2015 02:45 AM EDT Reads: 1,481
GENBAND has announced that SageNet is leveraging the Nuvia platform to deliver Unified Communications as a Service (UCaaS) to its large base of retail and enterprise customers. Nuvia’s cloud-based solution provides SageNet’s customers with a full suite of business communications and collaboration tools. Two large national SageNet retail customers have recently signed up to deploy the Nuvia platform and the company will continue to sell the service to new and existing customers. Nuvia’s capabili...
Mar. 30, 2015 01:00 AM EDT Reads: 1,478
WHOA.com has announced the newest addition to its data center footprint with the expansion into Equinix's newest state-of-the-art facility: DC-11 Washington, DC IBX+. Located in Ashburn, VA, this data center expands Whoa.com's presence to meet rapidly expanding customer demand for secure cloud solutions. Equinix, Inc. operates International Business Exchange™ (IBX®) data centers in 32 markets across 15 countries in the Americas, EMEA, and Asia-Pacific. Equinix is committed to operating faciliti...
Mar. 30, 2015 12:00 AM EDT Reads: 1,158
SYS-CON Events announced today that the DevOps Institute has been named “Association Sponsor” of SYS-CON's DevOps Summit, which will take place on June 9–11, 2015, at the Javits Center in New York City, NY. The DevOps Institute provides enterprise level training and certification. Working with thought leaders from the DevOps community, the IT Service Management field and the IT training market, the DevOps Institute is setting the standard in quality for DevOps education and training.
Mar. 29, 2015 10:30 PM EDT Reads: 1,155
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here
Mar. 29, 2015 10:00 PM EDT Reads: 1,818
SYS-CON Events announced today that Cisco, the worldwide leader in IT that transforms how people connect, communicate and collaborate, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Cisco makes amazing things happen by connecting the unconnected. Cisco has shaped the future of the Internet by becoming the worldwide leader in transforming how people connect, communicate and collaborat...
Mar. 29, 2015 07:00 PM EDT Reads: 5,239
WSM International is launching a DevOps services division that offers assessment, consulting and implementation to large enterprises and organizations with complex infrastructures. This is the first independent services company to create a dedicated practice to help organizations looking to transition to the DevOps model. The concept of DevOps is to blend information technology (IT) software development with operations to optimize the computing infrastructure according to the specific needs of ...
Mar. 29, 2015 07:00 PM EDT Reads: 1,543
SYS-CON Events announced today that robomq.io will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. robomq.io is an interoperable and composable platform that connects any device to any application. It helps systems integrators and the solution providers build new and innovative products and service for industries requiring monitoring or intelligence from devices and sensors.
Mar. 29, 2015 06:00 PM EDT Reads: 1,487
The WebRTC Summit 2014 New York, to be held June 9-11, 2015, at the Javits Center in New York, NY, announces that its Call for Papers is open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 16th International Cloud Expo, @ThingsExpo, Big Data Expo, and DevOps Summit.
Mar. 29, 2015 06:00 PM EDT Reads: 1,607
Temasys has announced senior management additions to its team. Joining are David Holloway as Vice President of Commercial and Nadine Yap as Vice President of Product. Over the past 12 months Temasys has doubled in size as it adds new customers and expands the development of its Skylink platform. Skylink leads the charge to move WebRTC, traditionally seen as a desktop, browser based technology, to become a ubiquitous web communications technology on web and mobile, as well as Internet of Things...
Mar. 29, 2015 06:00 PM EDT Reads: 1,858
Hosted PaaS providers have given independent developers and startups huge advantages in efficiency and reduced time-to-market over their more process-bound counterparts in enterprises. Software frameworks are now available that allow enterprise IT departments to provide these same advantages for developers in their own organization. In his workshop session at DevOps Summit, Troy Topnik, ActiveState’s Technical Product Manager, will show how on-prem or cloud-hosted Private PaaS can enable organ...
Mar. 29, 2015 05:45 PM EDT Reads: 1,275
DevOps tasked with driving success in the cloud need a solution to efficiently leverage multiple clouds while avoiding cloud lock-in. Flexiant today announces the commercial availability of Flexiant Concerto. With Flexiant Concerto, DevOps have cloud freedom to automate the build, deployment and operations of applications consistently across multiple clouds. Concerto is available through four disruptive pricing models aimed to deliver multi-cloud at a price point everyone can afford.
Mar. 29, 2015 05:00 PM EDT Reads: 973
Today, IT is not just a cost center. IT is an enabler and driver of business. With the emergence of the hybrid cloud paradigm, IT now has increasingly more capabilities to create new strategic opportunities for a business. Hybrid cloud allows an organization to utilize multi-tenant public clouds, dedicated private clouds, bare metal hosting, and the associated support and services for the right use cases through an on-demand, XaaS model. This model of IT creates tremendous opportunities for busi...
Mar. 29, 2015 05:00 PM EDT Reads: 3,183
Docker is an excellent platform for organizations interested in running microservices. It offers portability and consistency between development and production environments, quick provisioning times, and a simple way to isolate services. In his session at DevOps Summit at 16th Cloud Expo, Shannon Williams, co-founder of Rancher Labs, will walk through these and other benefits of using Docker to run microservices, and provide an overview of RancherOS, a minimalist distribution of Linux designed...
Mar. 29, 2015 04:15 PM EDT Reads: 2,442
Business as usual for IT is evolving into a “Make or Buy” decision on a service-by-service conversation with input from the LOBs. How does your organization move forward with cloud? In his general session at 16th Cloud Expo, Paul Maravei, Regional Sales Manager, Hybrid Cloud and Managed Services at Cisco, discusses how Cisco and its partners offer a market-leading portfolio and ecosystem of cloud infrastructure and application services that allow you to uniquely and securely combine cloud busi...
Mar. 29, 2015 04:15 PM EDT Reads: 1,435
Businesses are looking to empower employees and departments to do more, go faster, and streamline their processes. For all workers – but mobile workers especially – utilizing the cloud to reconnect documents and improve processes without destructing existing workflows can have a dramatic impact on productivity. In his session at 16th Cloud Expo, Mark Grilli, vice president of Acrobat Solutions marketing at Adobe Systems Incorporated, will outline new ways that the cloud is changing the way peo...
Mar. 29, 2015 04:00 PM EDT Reads: 1,342
SYS-CON Events announced today that Vitria Technology, Inc. will exhibit at SYS-CON’s @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Vitria will showcase the company’s new IoT Analytics Platform through live demonstrations at booth #330. Vitria’s IoT Analytics Platform, fully integrated and powered by an operational intelligence engine, enables customers to rapidly build and operationalize advanced analytics to deliver timely business outcomes ...
Mar. 29, 2015 03:30 PM EDT Reads: 2,175
Are your applications getting in the way of your business strategy? It’s time to rethink your IT approach. In his session at 16th Cloud Expo, Madhukar Kumar, Vice President, Product Management at Liaison Technologies, will discuss a new data-centric approach to IT that allows your data, not applications, to inform business strategy. By moving away from an application-centric IT model where data integration and analysis are subservient to the constraints of applications, your organization will b...
Mar. 29, 2015 03:15 PM EDT Reads: 2,579
SYS-CON Events announced today that Liaison Technologies, a leading provider of data management and integration cloud services and solutions, has been named "Silver Sponsor" of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York, NY. Liaison Technologies is a recognized market leader in providing cloud-enabled data integration and data management solutions to break down complex information barriers, enabling enterprises to make sm...
Mar. 29, 2015 03:00 PM EDT Reads: 3,476
SYS-CON Events announced today that MangoApps will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY., and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. MangoApps provides private all-in-one social intranets allowing workers to securely collaborate from anywhere in the world and from any device. Social, mobile, and eas...
Mar. 29, 2015 03:00 PM EDT Reads: 3,052