Welcome!

@CloudExpo Authors: Elizabeth White, Pat Romanski, Liz McMillan, Matthew McKenna, AppNeta Blog

Related Topics: Containers Expo Blog, Microservices Expo

Containers Expo Blog: Case Study

Data Virtualization at Pfizer: A Case Study

New integration infrastructure built for business agility

Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility is the first book published on the topic of data virtualization. Along with an overview of data virtualization and its advantages, it presents ten case studies of organizations that have adopted data virtualization to significantly improve business decision making, decrease time-to-solution and reduce costs. This article describes data virtualization adoption at one of the enterprises profiled, Pfizer Inc.

Organization Background
Pfizer Inc. is a biopharmaceutical company that develops, manufactures and markets medicines for both humans and animals. As the world's largest drug manufacturer, Pfizer operates globally with 111,500 employees and a presence in over 100 countries.

Worldwide Pharmaceutical Sciences (PharmSci) is a group of scientists responsible for enabling what drugs Pfizer will bring to market. This group designs, synthesizes and manufactures all drugs that are part of clinical trials and toxicology testing within Pfizer.

For this case study, we interviewed Dr. Michael C. Linhares, Ph.D and Research Fellow. Linhares heads up the Business Information Systems (BIS) team within PharmSci.

BIS is responsible for portfolio and resource management across all of PharmSci's projects. This involves designing, building and supporting systems that deliver data to executive teams and staff to help them make decisions regarding how to allocate available resources - both people and dollars - across the overall portfolio of over 100 projects annually.

The Business Problem
A major challenge for PharmSci is the fact that it has a complex portfolio of projects that is constantly changing.

According to Linhares, "Every week, something new comes up and we need to ensure that the right information is communicated to the right people. The people making decisions about resource allocation need easy and simple methods for obtaining that information. One aspect of this is that some people learn the information first and they need to communicate it to others who are responsible for making decisions based on the information. This creates an information-sharing challenge."

Linhares estimates that there are 80 to 100 information producers within PharmSci and over 1,000 information consumers, including the executives who seek a full picture of the project portfolio - financial data, project data, people data and data about the pharmaceutical compounds themselves.

The Technical Problem
The data required is created in and managed by different applications, each developed by a different team, stored in multiple sources managed by different technologies, and the applications don't talk to each other.

This makes it very difficult to access summary information across all projects. Examples would be identifying how much money is being spent on all projects in the project management system, what the next milestones are and when each will be met, and who is working on each project. "We needed a solution that would allow us to pull all this information together in an agile way."

When Linhares joined PharmSci, there was very little in the way of effective information integration. Most integration was done manually by exporting data from various systems into Excel spreadsheets and then either combining spreadsheets or taking the spreadsheet data and moving it into Access or SQL Server databases. With no real security controls, this approach also lacked scalability and opportunities for reuse, generated multiple copies of the spreadsheets (with various changes), and it often took weeks to build a spreadsheet with only a 50% chance that it would include all of the data required.

Solution Requirements
To be successful, the solution to these data integration and reporting problems had to provide the following:

  • A single, integrated view of all data sources with a common set of naming conventions
  • A flexible middle layer that would be independent of both the data sources on the back end and the reporting tools on the front end to facilitate easy change management
  • Shared metadata and business rule functionality so there would be a single point for managing and monitoring the solution
  • A development platform that supported fast, iterative development and, therefore, continuous process improvement

Three Options Considered
BIS considered three solution architectures to meet their business and technical challenges.

  1. Traditional Information Factory: The first option was a traditional approach of an integrated, scalable information factory. Pfizer had already implemented information factories in the division using a combination of Informatica ETL tools, Oracle databases and custom-built reporting applications. However, according to Linhares, an information factory "seemed like overkill. We didn't have high volumes of data, nor did we need the inherent complexity of using ETL tools to transform and move data while making sure we included all the detailed data we might possibly ever need over time." Furthermore, because of the way the information factories were managed within Pfizer, change management entailed significant overhead. However, the architectural concepts of an information factory were not going to be ignored in the final solution.
  2. Single Vendor Stack: A second possible approach was to implement the solution in a single integrated technology (SQL Server with integration services). Major disadvantages were the lack of access to multiple data source types, the need to move data multiple times and the lack of an integrated metadata repository for understanding and organizing the data model.
  3. Data Virtualization: The third option was to create a federated data virtualization layer that integrated and accessed the underlying data sources through virtual views of the data. By leaving the source data in place, this approach would eliminate the issues inherent in copying and moving all the data (which Linhares described as unnecessary, "non-value added" activities). With the right technology and mix of products, data virtualization would enable PharmSci to migrate from inefficient, off-line spreadmarts to online access to integrated information that could be rapidly tailored and reused to dramatically increase its value to the organization.

The Data Virtualization Solution - Architecture
Pfizer's solution is the PharmSci Portfolio Database (PSPD), a federated data delivery framework implemented with the Composite Data Virtualization Platform.

Data virtualization enables the integration of all PharmSci data sources into a single reporting schema of information that can be accessed by all front-end tools and users. The solution architecture includes the following components:

Trusted Data Sources: There are many sources of data for PSPD; they are geographically dispersed, store data in a variety of formats across a multivendor, heterogeneous data environment. Here are some examples:

  • Enterprise Project Management (EPM) is a SQL Server database of WRD's drug portfolio project plans. It includes detailed project schedules and milestones.
  • The Global Information Factory (GIF) is an Oracle-based data warehouse of monthly finance data.
  • OneSource, a database of corporate-level drug portfolio information is itself a unified set of Composite views across several different sources built by another group within Pfizer.
  • Flat files are provided by the Finance Department on actual resource use.
  • SharePoint lists are small SharePoint databases accessed using a web service.
  • There are other data sources as well, including custom-built systems. As Linhares pointed out, "It doesn't matter what data sources we have. With a virtual approach, we are not limited by the types of data we need to access."

Data Virtualization Layer: The Composite Data Virtualization Platform forms the data virtualization layer that enables the solution to be independent of the data sources and front-end tools. It provides abstracted access to all of the data sources and delivers the data through virtual views. These views effectively present the PharmSci Portfolio Database as subject-specific data marts. The Composite metadata repository manages data lineage and business rules.

Consuming Applications: The flexibility of the platform is demonstrated by the varied reporting applications that use the information in PSPD. Examples include:

  • SAP Business Objects for ad hoc queries, standard reports and dashboards.
  • TIBCO Spotfire for analytics and access to data through standard presentation reports.
  • Web services for parameterized queries.
  • Data services to provide data for downstream applications.
  • QuickViews (web pages built using DevExpress, a .NET toolkit) for access to live data.

SharePoint Portal: Branded as "InfoSource," this team collaboration web portal is the front-end interface that provides integrated access to PSPD data for all PharmSci customers through the consuming applications described above.

The Data Virtualization Solution - Best Practices
Linhares and team applied a number of data virtualization best practices when implementing the architecture described above.

Two Layers of Abstraction: Linhares stressed the importance of building two clear levels of abstraction into the data virtualization architecture. The first level abstracts Sources (the information abstraction layer), the second consumers (the reporting abstraction layer).

"We built a representation of the data in Composite. If a source is ever changed by the owner, which often happens, we can update the representation in the information abstraction layer quickly. This allows control of all downstream data in one location."

The second level of abstraction is the one between the reporting schema and the front-end reporting tools. A consolidated and integrated set of information is exposed as a single schema. This allows BIS to be system agnostic and support the use of whatever tool is best for the customer. All of the reporting tools use the same reporting abstraction layer; they always get the same answer to the same question because there is only a single source of data.

Consolidated Business Rules: Another key piece of the solution is the ability to include the business rules about how PharmSci manages its data within these abstraction layers. The business rules are embedded in the view definitions and are applied consistently at the same point.

Rapid Application Development Process: Prior to data virtualization, data integration was the slowest step for BIS in fulfilling a customer request for information. Now it's typically the fastest. "For example, a request that came in Friday morning and was completed by that afternoon. The customer's response was an amazed, ‘What do you mean you already have it done?'"

BIS uses a simple development process. The first step is what Linhares calls "triage" - looking at what the customer wants, estimating how long it will take and communicating that to the customer.

BIS does not spend a lot of time documenting the requirements of the solution. Instead, the group first creates a prototype on paper in the form of a simple data flow, then creates the necessary virtual views, gives the customer web access to the views and asks: "Is this what you wanted?"

The customer can then play with the result and respond with any changes or additions needed. BIS arrives at the final solution working with the customer in an iterative process.

Summary of Benefits
Linhares described several major benefits of the data virtualization solution.

The ability to provide integrated data in context: Data virtualization has enabled BIS to replace isolated silos of data with a data delivery platform that integrates different types and sources of data into a comprehensive package of value-added information. Instead of only the team leader and a core group of eight to ten people knowing about a project, the entire organization has access to relevant project information.

The independence of the data virtualization layer: "This is one of the huge benefits of data virtualization. It allows me to manage and monitor everything in one place and it makes change management easy for BIS and transparent to users."

Fast, iterative development environment: The data delivery infrastructure already exists in the data virtualization layer (defined data sources, standard naming conventions, access methods, etc.) so when a request for information comes in, BIS can quickly put it together for the customer.

Elimination of manual effort throughout PharmSci: According to Linhares, people initially resisted going away from their spreadsheets. But once there was a single source for the data and it was all available through InfoSource, there was a dramatic reduction in the need to have meetings to reconcile spreadsheet data among teams.

•   •   •

Editor's Note: Robert Eve is the co-author, along with Judith R. Davis, of Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility, the first book published on the topic of data virtualization. The complete Pfizer case study, along with nine others enterprise are available in the book.

More Stories By Robert Eve

Robert Eve is the EVP of Marketing at Composite Software, the data virtualization gold standard and co-author of Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility. Bob's experience includes executive level roles at leading enterprise software companies such as Mercury Interactive, PeopleSoft, and Oracle. Bob holds a Masters of Science from the Massachusetts Institute of Technology and a Bachelor of Science from the University of California at Berkeley.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
Identity is in everything and customers are looking to their providers to ensure the security of their identities, transactions and data. With the increased reliance on cloud-based services, service providers must build security and trust into their offerings, adding value to customers and improving the user experience. Making identity, security and privacy easy for customers provides a unique advantage over the competition.
In his session at @DevOpsSummit at 19th Cloud Expo, Yoseph Reuveni, Director of Software Engineering at Jet.com, will discuss Jet.com's journey into containerizing Microsoft-based technologies like C# and F# into Docker. He will talk about lessons learned and challenges faced, the Mono framework tryout and how they deployed everything into Azure cloud. Yoseph Reuveni is a technology leader with unique experience developing and running high throughput (over 1M tps) distributed systems with extre...
"We've discovered that after shows 80% if leads that people get, 80% of the conversations end up on the show floor, meaning people forget about it, people forget who they talk to, people forget that there are actual business opportunities to be had here so we try to help out and keep the conversations going," explained Jeff Mesnik, Founder and President of ContentMX, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Isomorphic Software will exhibit at DevOps Summit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Isomorphic Software provides the SmartClient HTML5/AJAX platform, the most advanced technology for building rich, cutting-edge enterprise web applications for desktop and mobile. SmartClient combines the productivity and performance of traditional desktop software with the simp...
As companies gain momentum, the need to maintain high quality products can outstrip their development team’s bandwidth for QA. Building out a large QA team (whether in-house or outsourced) can slow down development and significantly increases costs. This eBook takes QA profiles from 5 companies who successfully scaled up production without building a large QA team and includes: What to consider when choosing CI/CD tools How culture and communication can make or break implementation
"When you think about the data center today, there's constant evolution, The evolution of the data center and the needs of the consumer of technology change, and they change constantly," stated Matt Kalmenson, VP of Sales, Service and Cloud Providers at Veeam Software, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
"There's a growing demand from users for things to be faster. When you think about all the transactions or interactions users will have with your product and everything that is between those transactions and interactions - what drives us at Catchpoint Systems is the idea to measure that and to analyze it," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York Ci...
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 19th International Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world and ThingsExpo Silicon Valley Call for Papers is now open.
Extreme Computing is the ability to leverage highly performant infrastructure and software to accelerate Big Data, machine learning, HPC, and Enterprise applications. High IOPS Storage, low-latency networks, in-memory databases, GPUs and other parallel accelerators are being used to achieve faster results and help businesses make better decisions. In his session at 18th Cloud Expo, Michael O'Neill, Strategic Business Development at NVIDIA, focused on some of the unique ways extreme computing is...
"We view the cloud not really as a specific technology but as a way of doing business and that way of doing business is transforming the way software, infrastructure and services are being delivered to business," explained Matthew Rosen, CEO and Director at Fusion, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Redis is not only the fastest database, but it is the most popular among the new wave of databases running in containers. Redis speeds up just about every data interaction between your users or operational systems. In his session at 19th Cloud Expo, Dave Nielsen, Developer Advocate, Redis Labs, will share the functions and data structures used to solve everyday use cases that are driving Redis' popularity.
Aspose.Total for .NET is the most complete package of all file format APIs for .NET as offered by Aspose. It empowers developers to create, edit, render, print and convert between a wide range of popular document formats within any .NET, C#, ASP.NET and VB.NET applications. Aspose compiles all .NET APIs on a daily basis to ensure that it contains the most up to date versions of each of Aspose .NET APIs. If a new .NET API or a new version of existing APIs is released during the subscription peri...
Organizations planning enterprise data center consolidation and modernization projects are faced with a challenging, costly reality. Requirements to deploy modern, cloud-native applications simultaneously with traditional client/server applications are almost impossible to achieve with hardware-centric enterprise infrastructure. Compute and network infrastructure are fast moving down a software-defined path, but storage has been a laggard. Until now.
"My role is working with customers, helping them go through this digital transformation. I spend a lot of time talking to banks, big industries, manufacturers working through how they are integrating and transforming their IT platforms and moving them forward," explained William Morrish, General Manager Product Sales at Interoute, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
To leverage Continuous Delivery, enterprises must consider impacts that span functional silos, as well as applications that touch older, slower moving components. Managing the many dependencies can cause slowdowns. See how to achieve continuous delivery in the enterprise.
You think you know what’s in your data. But do you? Most organizations are now aware of the business intelligence represented by their data. Data science stands to take this to a level you never thought of – literally. The techniques of data science, when used with the capabilities of Big Data technologies, can make connections you had not yet imagined, helping you discover new insights and ask new questions of your data. In his session at @ThingsExpo, Sarbjit Sarkaria, data science team lead ...
SYS-CON Events announced today the Kubernetes and Google Container Engine Workshop, being held November 3, 2016, in conjunction with @DevOpsSummit at 19th Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA. This workshop led by Sebastian Scheele introduces participants to Kubernetes and Google Container Engine (GKE). Through a combination of instructor-led presentations, demonstrations, and hands-on labs, students learn the key concepts and practices for deploying and maintainin...
Security, data privacy, reliability, and regulatory compliance are critical factors when evaluating whether to move business applications from in-house, client-hosted environments to a cloud platform. Quality assurance plays a vital role in ensuring that the appropriate level of risk assessment, verification, and validation takes place to ensure business continuity during the migration to a new cloud platform.
Extracting business value from Internet of Things (IoT) data doesn’t happen overnight. There are several requirements that must be satisfied, including IoT device enablement, data analysis, real-time detection of complex events and automated orchestration of actions. Unfortunately, too many companies fall short in achieving their business goals by implementing incomplete solutions or not focusing on tangible use cases. In his general session at @ThingsExpo, Dave McCarthy, Director of Products...
Security, data privacy, reliability and regulatory compliance are critical factors when evaluating whether to move business applications from in-house client hosted environments to a cloud platform. In her session at 18th Cloud Expo, Vandana Viswanathan, Associate Director at Cognizant, In this session, will provide an orientation to the five stages required to implement a cloud hosted solution validation strategy.