|By Robert Eve||
|December 6, 2011 09:30 AM EST||
Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility is the first book published on the topic of data virtualization. Along with an overview of data virtualization and its advantages, it presents ten case studies of organizations that have adopted data virtualization to significantly improve business decision making, decrease time-to-solution and reduce costs. This article describes data virtualization adoption at one of the enterprises profiled, Pfizer Inc.
Pfizer Inc. is a biopharmaceutical company that develops, manufactures and markets medicines for both humans and animals. As the world's largest drug manufacturer, Pfizer operates globally with 111,500 employees and a presence in over 100 countries.
Worldwide Pharmaceutical Sciences (PharmSci) is a group of scientists responsible for enabling what drugs Pfizer will bring to market. This group designs, synthesizes and manufactures all drugs that are part of clinical trials and toxicology testing within Pfizer.
For this case study, we interviewed Dr. Michael C. Linhares, Ph.D and Research Fellow. Linhares heads up the Business Information Systems (BIS) team within PharmSci.
BIS is responsible for portfolio and resource management across all of PharmSci's projects. This involves designing, building and supporting systems that deliver data to executive teams and staff to help them make decisions regarding how to allocate available resources - both people and dollars - across the overall portfolio of over 100 projects annually.
The Business Problem
A major challenge for PharmSci is the fact that it has a complex portfolio of projects that is constantly changing.
According to Linhares, "Every week, something new comes up and we need to ensure that the right information is communicated to the right people. The people making decisions about resource allocation need easy and simple methods for obtaining that information. One aspect of this is that some people learn the information first and they need to communicate it to others who are responsible for making decisions based on the information. This creates an information-sharing challenge."
Linhares estimates that there are 80 to 100 information producers within PharmSci and over 1,000 information consumers, including the executives who seek a full picture of the project portfolio - financial data, project data, people data and data about the pharmaceutical compounds themselves.
The Technical Problem
The data required is created in and managed by different applications, each developed by a different team, stored in multiple sources managed by different technologies, and the applications don't talk to each other.
This makes it very difficult to access summary information across all projects. Examples would be identifying how much money is being spent on all projects in the project management system, what the next milestones are and when each will be met, and who is working on each project. "We needed a solution that would allow us to pull all this information together in an agile way."
When Linhares joined PharmSci, there was very little in the way of effective information integration. Most integration was done manually by exporting data from various systems into Excel spreadsheets and then either combining spreadsheets or taking the spreadsheet data and moving it into Access or SQL Server databases. With no real security controls, this approach also lacked scalability and opportunities for reuse, generated multiple copies of the spreadsheets (with various changes), and it often took weeks to build a spreadsheet with only a 50% chance that it would include all of the data required.
To be successful, the solution to these data integration and reporting problems had to provide the following:
- A single, integrated view of all data sources with a common set of naming conventions
- A flexible middle layer that would be independent of both the data sources on the back end and the reporting tools on the front end to facilitate easy change management
- Shared metadata and business rule functionality so there would be a single point for managing and monitoring the solution
- A development platform that supported fast, iterative development and, therefore, continuous process improvement
Three Options Considered
BIS considered three solution architectures to meet their business and technical challenges.
- Traditional Information Factory: The first option was a traditional approach of an integrated, scalable information factory. Pfizer had already implemented information factories in the division using a combination of Informatica ETL tools, Oracle databases and custom-built reporting applications. However, according to Linhares, an information factory "seemed like overkill. We didn't have high volumes of data, nor did we need the inherent complexity of using ETL tools to transform and move data while making sure we included all the detailed data we might possibly ever need over time." Furthermore, because of the way the information factories were managed within Pfizer, change management entailed significant overhead. However, the architectural concepts of an information factory were not going to be ignored in the final solution.
- Single Vendor Stack: A second possible approach was to implement the solution in a single integrated technology (SQL Server with integration services). Major disadvantages were the lack of access to multiple data source types, the need to move data multiple times and the lack of an integrated metadata repository for understanding and organizing the data model.
- Data Virtualization: The third option was to create a federated data virtualization layer that integrated and accessed the underlying data sources through virtual views of the data. By leaving the source data in place, this approach would eliminate the issues inherent in copying and moving all the data (which Linhares described as unnecessary, "non-value added" activities). With the right technology and mix of products, data virtualization would enable PharmSci to migrate from inefficient, off-line spreadmarts to online access to integrated information that could be rapidly tailored and reused to dramatically increase its value to the organization.
The Data Virtualization Solution - Architecture
Pfizer's solution is the PharmSci Portfolio Database (PSPD), a federated data delivery framework implemented with the Composite Data Virtualization Platform.
Data virtualization enables the integration of all PharmSci data sources into a single reporting schema of information that can be accessed by all front-end tools and users. The solution architecture includes the following components:
Trusted Data Sources: There are many sources of data for PSPD; they are geographically dispersed, store data in a variety of formats across a multivendor, heterogeneous data environment. Here are some examples:
- Enterprise Project Management (EPM) is a SQL Server database of WRD's drug portfolio project plans. It includes detailed project schedules and milestones.
- The Global Information Factory (GIF) is an Oracle-based data warehouse of monthly finance data.
- OneSource, a database of corporate-level drug portfolio information is itself a unified set of Composite views across several different sources built by another group within Pfizer.
- Flat files are provided by the Finance Department on actual resource use.
- SharePoint lists are small SharePoint databases accessed using a web service.
- There are other data sources as well, including custom-built systems. As Linhares pointed out, "It doesn't matter what data sources we have. With a virtual approach, we are not limited by the types of data we need to access."
Data Virtualization Layer: The Composite Data Virtualization Platform forms the data virtualization layer that enables the solution to be independent of the data sources and front-end tools. It provides abstracted access to all of the data sources and delivers the data through virtual views. These views effectively present the PharmSci Portfolio Database as subject-specific data marts. The Composite metadata repository manages data lineage and business rules.
Consuming Applications: The flexibility of the platform is demonstrated by the varied reporting applications that use the information in PSPD. Examples include:
- SAP Business Objects for ad hoc queries, standard reports and dashboards.
- TIBCO Spotfire for analytics and access to data through standard presentation reports.
- Web services for parameterized queries.
- Data services to provide data for downstream applications.
- QuickViews (web pages built using DevExpress, a .NET toolkit) for access to live data.
SharePoint Portal: Branded as "InfoSource," this team collaboration web portal is the front-end interface that provides integrated access to PSPD data for all PharmSci customers through the consuming applications described above.
The Data Virtualization Solution - Best Practices
Linhares and team applied a number of data virtualization best practices when implementing the architecture described above.
Two Layers of Abstraction: Linhares stressed the importance of building two clear levels of abstraction into the data virtualization architecture. The first level abstracts Sources (the information abstraction layer), the second consumers (the reporting abstraction layer).
"We built a representation of the data in Composite. If a source is ever changed by the owner, which often happens, we can update the representation in the information abstraction layer quickly. This allows control of all downstream data in one location."
The second level of abstraction is the one between the reporting schema and the front-end reporting tools. A consolidated and integrated set of information is exposed as a single schema. This allows BIS to be system agnostic and support the use of whatever tool is best for the customer. All of the reporting tools use the same reporting abstraction layer; they always get the same answer to the same question because there is only a single source of data.
Consolidated Business Rules: Another key piece of the solution is the ability to include the business rules about how PharmSci manages its data within these abstraction layers. The business rules are embedded in the view definitions and are applied consistently at the same point.
Rapid Application Development Process: Prior to data virtualization, data integration was the slowest step for BIS in fulfilling a customer request for information. Now it's typically the fastest. "For example, a request that came in Friday morning and was completed by that afternoon. The customer's response was an amazed, ‘What do you mean you already have it done?'"
BIS uses a simple development process. The first step is what Linhares calls "triage" - looking at what the customer wants, estimating how long it will take and communicating that to the customer.
BIS does not spend a lot of time documenting the requirements of the solution. Instead, the group first creates a prototype on paper in the form of a simple data flow, then creates the necessary virtual views, gives the customer web access to the views and asks: "Is this what you wanted?"
The customer can then play with the result and respond with any changes or additions needed. BIS arrives at the final solution working with the customer in an iterative process.
Summary of Benefits
Linhares described several major benefits of the data virtualization solution.
The ability to provide integrated data in context: Data virtualization has enabled BIS to replace isolated silos of data with a data delivery platform that integrates different types and sources of data into a comprehensive package of value-added information. Instead of only the team leader and a core group of eight to ten people knowing about a project, the entire organization has access to relevant project information.
The independence of the data virtualization layer: "This is one of the huge benefits of data virtualization. It allows me to manage and monitor everything in one place and it makes change management easy for BIS and transparent to users."
Fast, iterative development environment: The data delivery infrastructure already exists in the data virtualization layer (defined data sources, standard naming conventions, access methods, etc.) so when a request for information comes in, BIS can quickly put it together for the customer.
Elimination of manual effort throughout PharmSci: According to Linhares, people initially resisted going away from their spreadsheets. But once there was a single source for the data and it was all available through InfoSource, there was a dramatic reduction in the need to have meetings to reconcile spreadsheet data among teams.
• • •
Editor's Note: Robert Eve is the co-author, along with Judith R. Davis, of Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility, the first book published on the topic of data virtualization. The complete Pfizer case study, along with nine others enterprise are available in the book.
SYS-CON Events announced today that Intelligent Systems Services will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Established in 1994, Intelligent Systems Services Inc. is located near Washington, DC, with representatives and partners nationwide. ISS’s well-established track record is based on the continuous pursuit of excellence in designing, implementing and supporting nationwide clients’ mission-cri...
Apr. 21, 2015 02:15 AM EDT Reads: 2,468
The best mobile applications are augmented by dedicated servers, the Internet and Cloud services. Mobile developers should focus on one thing: writing the next socially disruptive viral app. Thanks to the cloud, they can focus on the overall solution, not the underlying plumbing. From iOS to Android and Windows, developers can leverage cloud services to create a common cross-platform backend to persist user settings, app data, broadcast notifications, run jobs, etc. This session provide...
Apr. 21, 2015 02:00 AM EDT Reads: 646
SYS-CON Events announced today that B2Cloud, a provider of enterprise resource planning software, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. B2cloud develops the software you need. They have the ideal tools to help you work with your clients. B2Cloud’s main solutions include AGIS – ERP, CLOHC, AGIS – Invoice, and IZUM
Apr. 21, 2015 02:00 AM EDT Reads: 3,292
The WebRTC Summit 2015 New York, to be held June 9-11, 2015, at the Javits Center in New York, NY, announces that its Call for Papers is open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 16th International Cloud Expo, @ThingsExpo, Big Data Expo, and DevOps Summit.
Apr. 21, 2015 01:45 AM EDT Reads: 2,324
SYS-CON Events announced today that Tufin, the market-leading provider of Security Policy Orchestration Solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. As the market leader of Security Policy Orchestration, Tufin automates and accelerates network configuration changes while maintaining security and compliance. Tufin's award-winning Orchestration Suite™ gives IT organizations the power and a...
Apr. 21, 2015 01:45 AM EDT Reads: 3,419
SYS-CON Events announced today that Cloudian, Inc., the leading provider of hybrid cloud storage solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Cloudian, Inc., is a Foster City, California - based software company specializing in cloud storage software. The main product is Cloudian, an Amazon S3-compliant cloud object storage platform, the bedrock of cloud computing systems, that enables c...
Apr. 21, 2015 01:00 AM EDT Reads: 2,508
SYS-CON Events announced today that Gridstore™, the leader in hyper-converged infrastructure purpose-built to optimize Microsoft workloads, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Gridstore™ is the leader in hyper-converged infrastructure purpose-built for Microsoft workloads and designed to accelerate applications in virtualized environments. Gridstore’s hyper-converged infrastructure is the ...
Apr. 21, 2015 12:45 AM EDT Reads: 4,427
BroadSoft on Tuesday announced that it is a recipient of the 2014 Frost & Sullivan Market Leadership Award in the Hosted/Cloud Internet Protocol (IP) Telephony market for Latin America. According to Frost & Sullivan market research, the Latin America (LATAM) hosted/cloud Internet Protocol (IP) telephony market, including integrated unified communications and collaboration (UC&C) applications, is currently experiencing a rapid growth trajectory and is expected to exhibit a tenfold rise in annual...
Apr. 21, 2015 12:00 AM EDT Reads: 2,084
SYS-CON Events announced today that IDenticard will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. IDenticard™ is the security division of Brady Corp (NYSE: BRC), a $1.5 billion manufacturer of identification products. We have small-company values with the strength and stability of a major corporation. IDenticard offers local sales, support and service to our customers across the United States and Canada...
Apr. 21, 2015 12:00 AM EDT Reads: 5,011
Modern Systems announced completion of a successful project with its new Rapid Program Modernization (eavRPMa"c) software. The eavRPMa"c technology architecturally transforms legacy applications, enabling faster feature development and reducing time-to-market for critical software updates. Working with Modern Systems, the University of California at Santa Barbara (UCSB) leveraged eavRPMa"c to transform its Student Information System from Software AG's Natural syntax to a modern application lev...
Apr. 20, 2015 11:45 PM EDT Reads: 1,529
In 2015, 4.9 billion connected "things" will be in use. By 2020, Gartner forecasts this amount to be 25 billion, a 410 percent increase in just five years. How will businesses handle this rapid growth of data? Hadoop will continue to improve its technology to meet business demands, by enabling businesses to access/analyze data in real time, when and where they need it. Cloudera's Chief Technologist, Eli Collins, will discuss how Big Data is keeping up with today's data demands and how in t...
Apr. 20, 2015 10:45 PM EDT Reads: 975
The 5th International DevOps Summit, co-located with 17th International Cloud Expo – being held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the...
Apr. 20, 2015 07:00 PM EDT Reads: 2,092
While DevOps most critically and famously fosters collaboration, communication, and integration through cultural change, culture is more of an output than an input. In order to actively drive cultural evolution, organizations must make substantial organizational and process changes, and adopt new technologies, to encourage a DevOps culture. Moderated by Andi Mann, panelists will discuss how to balance these three pillars of DevOps, where to focus attention (and resources), where organizations m...
Apr. 20, 2015 05:00 PM EDT Reads: 1,828
ProfitBricks has launched its new DevOps Central and REST API, along with support for three multi-cloud libraries and a Python SDK. This, combined with its already existing SOAP API and its new RESTful API, moves ProfitBricks into a position to better serve the DevOps community and provide the ability to automate cloud infrastructure in a multi-cloud world. Following this momentum, ProfitBricks has also introduced several libraries that enable developers to use their favorite language to code ...
Apr. 20, 2015 03:00 PM EDT Reads: 1,451
Health care systems across the globe are under enormous strain, as facilities reach capacity and costs continue to rise. M2M and the Internet of Things have the potential to transform the industry through connected health solutions that can make care more efficient while reducing costs. In fact, Vodafone's annual M2M Barometer Report forecasts M2M applications rising to 57 percent in health care and life sciences by 2016. Lively is one of Vodafone's health care partners, whose solutions enable o...
Apr. 20, 2015 03:00 PM EDT Reads: 1,122
SYS-CON Events announced today that Vicom Computer Services, Inc., a provider of technology and service solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. They are located at booth #427. Vicom Computer Services, Inc. is a progressive leader in the technology industry for over 30 years. Headquartered in the NY Metropolitan area. Vicom provides products and services based on today’s requirements...
Apr. 20, 2015 02:00 PM EDT Reads: 1,484
Dave will share his insights on how Internet of Things for Enterprises are transforming and making more productive and efficient operations and maintenance (O&M) procedures in the cleantech industry and beyond. Speaker Bio: Dave Landa is chief operating officer of Cybozu Corp (kintone US). Based in the San Francisco Bay Area, Dave has been on the forefront of the Cloud revolution driving strategic business development on the executive teams of multiple leading Software as a Services (SaaS) ap...
Apr. 20, 2015 02:00 PM EDT Reads: 1,151
How do you securely enable access to your applications in AWS without exposing any attack surfaces? The answer is usually very complicated because application environments morph over time in response to growing requirements from your employee base, your partners and your customers. In his session at 16th Cloud Expo, Haseeb Budhani, CEO and Co-founder of Soha, will share five common approaches that DevOps teams follow to secure access to applications deployed in AWS, Azure, etc., and the frict...
Apr. 20, 2015 01:30 PM EDT Reads: 1,500
What exactly is a cognitive application? In her session at 16th Cloud Expo, Ashley Hathaway, Product Manager at IBM Watson, will look at the services being offered by the IBM Watson Developer Cloud and what that means for developers and Big Data. She'll explore how IBM Watson and its partnerships will continue to grow and help define what it means to be a cognitive service, as well as take a look at the offerings on Bluemix. She will also check out how Watson and the Alchemy API team up to off...
Apr. 20, 2015 12:00 PM EDT Reads: 1,595
The IoT Bootcamp is coming to Cloud Expo | @ThingsExpo on June 9-10 at the Javits Center in New York. Instructor. Registration is now available at http://iotbootcamp.sys-con.com/ Instructor Janakiram MSV previously taught the famously successful Multi-Cloud Bootcamp at Cloud Expo | @ThingsExpo in November in Santa Clara. Now he is expanding the focus to Janakiram is the founder and CTO of Get Cloud Ready Consulting, a niche Cloud Migration and Cloud Operations firm that recently got acquir...
Apr. 20, 2015 12:00 PM EDT Reads: 1,310