Welcome!

Cloud Expo Authors: Elizabeth White, Liz McMillan, Vincent Brasseur, Pat Romanski, Roger Strukhoff

Related Topics: Cloud Expo, Java, SOA & WOA, Open Source, Web 2.0, Apache

Cloud Expo: Article

The Cure for the Common Cloud-Based Big Data Initiative

Understanding how to work with Big Data

There is no doubt that Big Data holds infinite promise for a range of industries. Better visibility into data across various sources enables everything from insight into saving electricity to agricultural yield to placement of ads on Google. But when it comes to deriving value from data, no industry has been doing it as long or with as much rigor as clinical researchers.

Unlike other markets that are delving into Big Data for the first time and don't know where to begin, drug and device developers have spent years refining complex processes for asking very specific questions with clear purposes and goals. Whether using data for designing an effective and safe treatment for cholesterol, or collecting and mining data to understand proper dosage of cancer drugs, life sciences has had to dot every "i" and cross every "t" in order to keep people safe and for new therapies to pass muster with the FDA. Other industries are now marveling at a new ability to uncover information about efficiencies and cost savings, but - with less than rigorous processes in place - they are often shooting in the dark or only scratching the surface of what Big Data offers.

Drug developers today are standing on the shoulders of those who created, tested and secured FDA approval for treatments involving millions of data points (for one drug alone!) without the luxury of the cloud or sophisticated analytics systems. These systems have the potential to make the best data-driven industry even better. This article will outline key lessons and real-world examples of what other industries can and should learn from life sciences when it comes to understanding how to work with Big Data.

What Questions to Ask, What Data to Collect
In order to gain valuable insights from Big Data, there are two absolute requirements that must be met - understanding both what questions to ask and what data to collect. These two components are symbiotic, and understanding both fully is difficult, requiring both domain expertise and practical experience.

In order to know what data to collect, you first must know the types of questions that you're going to want to ask - often an enigma. With the appropriate planning and experience-based guesses, you can often make educated assumptions. The trick to collecting data is that you need to collect enough to answer questions, but if you collect too much then you may not be able to distill the specific subset that will answer your questions. Also, explicit or inherent cost can prevent you from collecting all possible data, in which case you need to carefully select which areas to collect data about.

Let's take a look at how this is done in clinical trials. Say you're designing a clinical study that will analyze cancer data. You may not have specific questions when the study is being designed, but it's reasonable to assume that you'll want to collect data related to commonly impacted readings for the type of cancer and whatever body system is affected, so that you have the right information to analyze when it comes time.

You may also want to collect data unrelated to the specific disease that subsequent questions will likely require, such as information on demographics and medications that the patient is taking that are different from the treatment. During the post-study data analysis, questions on these areas often arise, even though the questions aren't initially apparent. Thus clinical researchers have adopted common processes for collecting data on demographics and concomitant medications. Through planning and experience, you can also identify areas that do not need to be collected for each study. For example, if you're studying lung cancer, collecting cognitive function data is probably unrelated.

How can other industries anticipate what questions to ask, as is done in life sciences? Well, determine a predefined set of questions that are directly related to the goal of the data analysis. Since you will not know all of the questions until after the data collection have started, it's important to 1) know the domain, and 2) collect any data you'll need to answer the likely questions that could come up.

Also, clinical researchers have learned that questions can be discovered automatically. There are data mining techniques that can uncover statistically significant connections, which in effect are raising questions that can be explored in more detail afterwards. An analysis can be planned before data is collected, but not actually be run until afterwards (or potentially during), if the appropriate data is collected.

One other area that has proven to be extremely important to collect is metadata, or data about the data - such as, when it was collected, where it was collected, what instrumentation was used in the process and what calibration information was available. All of this information can be utilized later on to answer a lot of potentially important questions. Maybe there was a specific instrument that was incorrectly configured and all the resulting data that it recorded is invalid. If you're running an ad network, maybe there's a specific web site where your ads are run that are gaming the system trying to get you to pay more. If you're running a minor league team, maybe there's a specific referee that's biased, which you can address for subsequent games. Or, if you're plotting oil reserves in the Gulf of Mexico, maybe there are certain exploratory vessels that are taking advantage of you. In all of these cases, without the appropriate metadata, it'd be impossible to know where real problems reside.

Identifying Touch Points to Be Reviewed Along the Way
There are ways to specify which types of analysis can be performed, even while data is being collected, that can affect either how data will continue to be collected or the outcome as a whole.

For example, some clinical studies run what's called interim analysis while the study is in progress. These interim analyses are planned, and the various courses that can be used afterwards are well defined, but the results afterward are statistically usable. This is called an adaptive clinical trial, and there are a lot of studies that are being performed to determine more effective and useful ways that these can be done in the future. The most important aspect of these is preventing biases, and this is something that has been well understood and tested by the pharmaceutical community over the past several decades. Simply understanding what's happening during the course of a trial, or how it affects the desired outcome, can actually bias the results.

The other key factor is that the touch points are accessible to everybody who needs the data. For example, if you have a person in the field, then it's important to have him or her access the data in a format that's easily consumable to them - maybe through an iPad or an existing intranet portal. Similarly, if you have an executive that needs to understand something at a high level, then getting it to them in an easily consumable executive dashboard is extremely important.

As the life sciences industry has learned, if the distribution channels of the analytics aren't seamless and frictionless, then they won't be utilized to their fullest extent. This is where cloud-based analytics become exceptionally powerful - the cloud makes it much easier to integrate analytics into every user's day. Once each user gets the exact information they need, effortlessly, they can then do their job better and the entire organization will work better - regardless of how and why the tools are being used.

Augmenting Human Intuition
Think about the different types of tools that people use on a daily basis. People use wrenches to help turn screws, cars to get to places faster and word processers to write. Sure, we can use our hands or walk, but we're much more efficient and better when we can use tools.

Cloud-based analytics is a tool that enables everybody in an organization to perform more efficiently and effectively. The first example of this type of augmentation in the life sciences industry is alerting. A user tells the computer what they want to see, and then the computer alerts them via email or text message when the situation arises. Users can set rules for the data it wants to see, and then the tools keep on the lookout to notify the user when the data they are looking for becomes available.

Another area the pharmaceutical industry has thoroughly explored is data-driven collaboration techniques. In the clinical trial process, there are many different groups of users: those who are physically collecting the data (investigators), others who are reviewing it to make sure that it's clean (data managers), and also people who are stuck in the middle (clinical monitors). Of course there are many other types of users, but this is just a subset to illustrate the point. These different groups of users all serve a particular purpose relating to the overall collection of data and success of the study. When the data looks problematic or unclean, the data managers will flag it for review, which the clinical monitors can act on.

What's unique about the way that life sciences deals with this is that they've set up complex systems and rules to make sure that the whole system runs well. The tools associated around these processes help augment human intuition through alerting, automated dissemination and automatic feedback. The questions aren't necessarily known at the beginning of a trial, but as the data is collected, new questions evolve and the tools and processes in place are built to handle the changing landscape.

No matter what the purpose of Big Data analytics, any organization can benefit from the mindset of cloud-based analytics as a tool that needs to consistently be adjusted and refined to meet the needs of users.

Ongoing Challenges of Big Data Analytics
Given this history with data, one would expect that drug and device developers would be light years ahead when it comes to leveraging Big Data technologies - especially given that the collection and analytics of clinical data is often a matter of life and death. But while they have much more experience with data, the truth is that life sciences organizations are just now starting to integrate analytics technologies that will enable them to work with that data in new, more efficient ways - no longer involving billions of dollars a year, countless statisticians, archaic methods, and, if we're being honest, brute force. As new technology becomes available, the industry will continue to become more and more seamless. In the meantime, other industries looking to wrap their heads around the Big Data challenge should look to life sciences as the starting point for best practices in understanding how and when to ask the right questions, monitoring data along the way and selecting tools that improve the user experience.

More Stories By Rick Morrison

Rick Morrison is CEO and co-founder of Comprehend Systems. Prior to Comprehend Systems, he was the Chief Technology Officer of an Internet-based data aggregator, where he was responsible for product development and operations. Prior to that, he was at Integrated Clinical Systems, where he led the design and implementation of several major new features. He also proposed and led a major infrastructure redesign, and introduced new, streamlined development processes. Rick holds a BS in Computer Science from Carnegie Mellon University in Pittsburgh, Pennsylvania.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
The Internet of Things will greatly expand the opportunities for data collection and new business models driven off of that data. In her session at @ThingsExpo, Esmeralda Swartz, CMO of MetraTech, discussed how for this to be effective you not only need to have infrastructure and operational models capable of utilizing this new phenomenon, but increasingly service providers will need to convince a skeptical public to participate. Get ready to show them the money!
"Matrix is an ambitious open standard and implementation that's set up to break down the fragmentation problems that exist in IP messaging and VoIP communication," explained John Woolf, Technical Evangelist at Matrix, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
The Internet of Things will put IT to its ultimate test by creating infinite new opportunities to digitize products and services, generate and analyze new data to improve customer satisfaction, and discover new ways to gain a competitive advantage across nearly every industry. In order to help corporate business units to capitalize on the rapidly evolving IoT opportunities, IT must stand up to a new set of challenges. In his session at @ThingsExpo, Jeff Kaplan, Managing Director of THINKstrateg...
Cultural, regulatory, environmental, political and economic (CREPE) conditions over the past decade are creating cross-industry solution spaces that require processes and technologies from both the Internet of Things (IoT), and Data Management and Analytics (DMA). These solution spaces are evolving into Sensor Analytics Ecosystems (SAE) that represent significant new opportunities for organizations of all types. Public Utilities throughout the world, providing electricity, natural gas and water,...
One of the biggest challenges when developing connected devices is identifying user value and delivering it through successful user experiences. In his session at Internet of @ThingsExpo, Mike Kuniavsky, Principal Scientist, Innovation Services at PARC, described an IoT-specific approach to user experience design that combines approaches from interaction design, industrial design and service design to create experiences that go beyond simple connected gadgets to create lasting, multi-device exp...
High-performing enterprise Software Quality Assurance (SQA) teams validate systems that are ready for use - getting most actively involved as components integrate and form complete systems. These teams catch and report on defects, making sure the customer gets the best software possible. SQA teams have leveraged automation and virtualization to execute more thorough testing in less time - bringing Dev and Ops together, ensuring production readiness. Does the emergence of DevOps mean the end of E...
Scott Jenson leads a project called The Physical Web within the Chrome team at Google. Project members are working to take the scalability and openness of the web and use it to talk to the exponentially exploding range of smart devices. Nearly every company today working on the IoT comes up with the same basic solution: use my server and you'll be fine. But if we really believe there will be trillions of these devices, that just can't scale. We need a system that is open a scalable and by using ...
The Internet of Things is tied together with a thin strand that is known as time. Coincidentally, at the core of nearly all data analytics is a timestamp. When working with time series data there are a few core principles that everyone should consider, especially across datasets where time is the common boundary. In his session at Internet of @ThingsExpo, Jim Scott, Director of Enterprise Strategy & Architecture at MapR Technologies, discussed single-value, geo-spatial, and log time series dat...
"Verizon offers public cloud, virtual private cloud as well as private cloud on-premises - many different alternatives. Verizon's deep knowledge in applications and the fact that we are responsible for applications that make call outs to other systems. Those systems and those resources may not be in Verizon Cloud, we understand at the end of the day it's going to be federated," explained Anne Plese, Senior Consultant, Cloud Product Marketing at Verizon Enterprise, in this SYS-CON.tv interview at...
"For the past 4 years we have been working mainly to export. For the last 3 or 4 years the main market was Russia. In the past year we have been working to expand our footprint in Europe and the United States," explained Andris Gailitis, CEO of DEAC, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
P2P RTC will impact the landscape of communications, shifting from traditional telephony style communications models to OTT (Over-The-Top) cloud assisted & PaaS (Platform as a Service) communication services. The P2P shift will impact many areas of our lives, from mobile communication, human interactive web services, RTC and telephony infrastructure, user federation, security and privacy implications, business costs, and scalability. In his session at @ThingsExpo, Robin Raymond, Chief Architect...
The Domain Name Service (DNS) is one of the most important components in networking infrastructure, enabling users and services to access applications by translating URLs (names) into IP addresses (numbers). Because every icon and URL and all embedded content on a website requires a DNS lookup loading complex sites necessitates hundreds of DNS queries. In addition, as more internet-enabled ‘Things' get connected, people will rely on DNS to name and find their fridges, toasters and toilets. Acco...
The term culture has had a polarizing effect among DevOps supporters. Some propose that culture change is critical for success with DevOps, but are remiss to define culture. Some talk about a DevOps culture but then reference activities that could lead to culture change and there are those that talk about culture change as a set of behaviors that need to be adopted by those in IT. There is no question that businesses successful in adopting a DevOps mindset have seen departmental culture change, ...
"Cloud consumption is something we envision at Solgenia. That is trying to let the cloud spread to the user as a consumption, as utility computing. We want to allow the people to just pay for what they use, not a subscription model," explained Ermanno Bonifazi, CEO & Founder of Solgenia, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Enthusiasm for the Internet of Things has reached an all-time high. In 2013 alone, venture capitalists spent more than $1 billion dollars investing in the IoT space. With "smart" appliances and devices, IoT covers wearable smart devices, cloud services to hardware companies. Nest, a Google company, detects temperatures inside homes and automatically adjusts it by tracking its user's habit. These technologies are quickly developing and with it come challenges such as bridging infrastructure gaps,...
SYS-CON Media announced that Centrify, a provider of unified identity management across cloud, mobile and data center environments that delivers single sign-on (SSO) for users and a simplified identity infrastructure for IT, has launched an ad campaign on Cloud Computing Journal. The ads focus on security: how an organization can successfully control privilege for all of the organization’s identities to mitigate identity-related risk without slowing down the business, and how Centrify provides ...
SAP is delivering break-through innovation combined with fantastic user experience powered by the market-leading in-memory technology, SAP HANA. In his General Session at 15th Cloud Expo, Thorsten Leiduck, VP ISVs & Digital Commerce, SAP, discussed how SAP and partners provide cloud and hybrid cloud solutions as well as real-time Big Data offerings that help companies of all sizes and industries run better. SAP launched an application challenge to award the most innovative SAP HANA and SAP HANA...
"SAP had made a big transition into the cloud as we believe it has significant value for our customers, drives innovation and is easy to consume. When you look at the SAP portfolio, SAP HANA is the underlying platform and it powers all of our platforms and all of our analytics," explained Thorsten Leiduck, VP ISVs & Digital Commerce at SAP, in this SYS-CON.tv interview at 15th Cloud Expo, held Nov 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
"We help companies that are using a lot of Software as a Service. We help companies manage and gain visibility into what people are using inside the company and decide to secure them or use standards to lock down or to embrace the adoption of SaaS inside the company," explained Scott Kriz, Co-founder and CEO of Bitium, in this SYS-CON.tv interview at 15th Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT organization pondering its infrastructure strategy. How should your organization enhance its IT framework to enable an Internet of Things implementation? In his session at Internet of @ThingsExpo, James Kirkland, Chief Ar...