Welcome!

@CloudExpo Authors: Pat Romanski, Jason Bloomberg, Liz McMillan, Elizabeth White, Kevin Jackson

Related Topics: @CloudExpo, @BigDataExpo, @ThingsExpo

@CloudExpo: Blog Feed Post

Election #DataScience and the Death of Truth | @CloudExpo #BigData #Analytics

Many candidates treated opinions as ‘truth’ and a large portion of the American public grabbed ahold of these ‘truths’ as gospel

The U.S. Presidential election is finally over. The protests are winding down, they’ve stopped burning cars in Oakland (for now), and the talks of California succession are waning. But I am struggling to return to “normal” because in this election, truth got hammered.

Many candidates treated opinions as “truth” and a large portion of the American public grabbed a hold of these “truths” as gospel. It may have been a good time to be in the “fact checking” business, but I’m not sure how effective even the fact checkers could be given the spontaneous nature of “opinions as facts” being thrown around, not to mention the people who create fake news intentionally.

So let’s play a game! Let’s call this game “Separate the Truth from the Myths.” Let’s see how you do.

  1. Bat Boy Sighted in NYC Subway (probably too expensive to get a condo in Manhattan)
  2. Obama Appoints Martian Ambassador (but the Senate will request Matt Damon since he’s already lived and farmed on Mars)
  3. Skynet is a Reality (Hey, even Iron Man showed up at the Senate to tell them so!)
  4. Ted Cruz Shot JFK (okay, so it actually was his dad, but accusing Ted Cruz is more funny)

All but one of these stories appeared in the highly credible “National Enquirer” or “Weekly World News.” That’s like buying a copy of the “Mad Magazine” (for you old timers) or reading “The Onion” (for you young whippersnappers) expecting the “truth” from these satirical publications (see Figure 1).

Figure 1: Real Headlines from “Less Than Credible” Sources

However the below stories in Figure 2 where plastered across social media sites as if they were the truth, and as you can see from the engagement numbers, lots of people took the time to read these “truths.”

Figure 2: Social Media Fake News and Number of Views

Data Science And Common Sense
As a data scientist, we need to know not to accept the “truth” without applying some common sense. For all the fancy training in neural networks, artificial intelligence and machine learning, it’s hard to replace “common sense” as a necessary data scientist characteristic. Let’s walk through an example of how a data scientist might approach one of the sensational stories that recently popped up on social media (see Figure 3).

Figure 3: The Guardian, September 26, 2016

OMG, murders are up 10.8% in the biggest percentage increase since 1971, according to a highly credible source like the FBI. It’s become the “Walking Dead” out there!

Sensational headlines grab attention and incite fear and dread. “Dirty Laundry” sells. But the problem with data at the aggregate level is that it:

  • Distorts the real truth (or root cause) of what’s the problem, and
  • It is not actionable

The above headline could lead to the conclusion that the current criminal and rehabilitation policies have failed and everything should be thrown out. But there are no details as to what aspects of these programs are broken and no triage of the root causes in order to explore what might be done to fix the problem. As a data scientist, one must demand the granular details so that we can turn the data into insights in order to make the information actionable, such as:

This is a good starting point. If we want to address the increase in murders, we need to drill into each individual murder (and attempted murder) in those 10 cities. We need to keep drilling into the granular details in order to identify those variables and metrics that might be predictors of murders and attempted murders.

For example, we could identify the specific blocks of these cities where the murders are occurring, or the time of day and day of week, or the time of the year, or any special events that occurred right before the murders, etc. We could explore other variables that might be indicative of an increase in murder (e.g., % of broken homes, % of children born out of wedlock, % of high school dropouts, % of drug addicts, unemployment rate among male adults, increase in graffiti).

Once we know those variables that are predictive of murders, then we have a focus as to where we can start fixing the problem, taking corrective actions such as adding more police or community outreach, reducing high school dropouts, increasing drug arrests, testing different programs and approaches, measuring program effectiveness, learning and improving. Now that’s thinking like a data scientist.

Data Scientist Lessons Learned
What are the lessons that we can take away from this “opinions as facts” syndrome?

  • Common sense is critical. Don’t accept “truths” at face value. Demand more details in order to identify and quantify those variables and metrics that might be predictive or indicative of the researched problem.
  • You can’t fix the business – or the country – without drilling into the details and the potential causal factors. We need insights that are drawn from facts that are supported by granular data so that we know what actions to take. With these detailed insights in hand, we now know where to invest our scarce financial and human resources.
  • Details matter. At the aggregate level, the headlines may be sensational, but it is not insightful or actionable until you get into the details. Remember Simpson’s Paradox.
  • Data quality, accuracy and reasonableness are important, especially if you are trying to make business-impactful decisions based upon that data. Business users, if they are expected to use the data to support decisions, must have confidence in the data. “Facts as Facts” are critical if we want to overcome decisions being made on a traditional basis such as gut, hearsay and history.

The good data scientist learns not to trust anything at first blush; that while opinions might yield variables and metrics that might be better predictors of performance, in the end the data scientists need to validate each of these variables and metrics to quantify if they really are better predictors of performance.

In the movie “Star Wars: The New Hope," the weak-minded Storm Troopers were easily dissuaded from pursuing the truth about the droids by Obi-Wan Kenobi’s use of the Jedi Mind Trick to plant the “truth” in their weak minds.

Don’t be weak-minded about seeking the truth. Use your common sense to challenge the “truth,” and get into the granular details so that one can identify and quantify those variables and metrics that are better predictor or indicators of the problems.

And beware the “These aren’t the Droids you’re looking for” syndrome. That’s for the weak-minded.

The post Election Data Science and the Death of Truth appeared first on InFocus Blog | Dell EMC Services.

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.

@CloudExpo Stories
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
SYS-CON Events announced today that SkyScale will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. SkyScale is a world-class provider of cloud-based, ultra-fast multi-GPU hardware platforms for lease to customers desiring the fastest performance available as a service anywhere in the world. SkyScale builds, configures, and manages dedicated systems strategically located in maximum-securit...
There is a huge demand for responsive, real-time mobile and web experiences, but current architectural patterns do not easily accommodate applications that respond to events in real time. Common solutions using message queues or HTTP long-polling quickly lead to resiliency, scalability and development velocity challenges. In his session at 21st Cloud Expo, Ryland Degnan, a Senior Software Engineer on the Netflix Edge Platform team, will discuss how by leveraging a reactive stream-based protocol,...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, will examine the regulations and provide insight on how it affects technology, challenges the established rules and will usher in new levels of diligence...
"DX encompasses the continuing technology revolution, and is addressing society's most important issues throughout the entire $78 trillion 21st-century global economy," said Roger Strukhoff, Conference Chair. "DX World Expo has organized these issues along 10 tracks with more than 150 of the world's top speakers coming to Istanbul to help change the world."
DX World EXPO, LLC., a Lighthouse Point, Florida-based startup trade show producer and the creator of "DXWorldEXPO® - Digital Transformation Conference & Expo" has announced its executive management team. The team is headed by Levent Selamoglu, who has been named CEO. "Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation," he said in making the announcement.
SYS-CON Events announced today that Massive Networks will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Massive Networks mission is simple. To help your business operate seamlessly with fast, reliable, and secure internet and network solutions. Improve your customer's experience with outstanding connections to your cloud.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
"The Striim platform is a full end-to-end streaming integration and analytics platform that is middleware that covers a lot of different use cases," explained Steve Wilkes, Founder and CTO at Striim, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Calligo, an innovative cloud service provider offering mid-sized companies the highest levels of data privacy and security, has been named "Bronze Sponsor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalised support service from its globally located cloud plat...
SYS-CON Events announced today that Calligo has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Calligo is an innovative cloud service provider offering mid-sized companies the highest levels of data privacy. Calligo offers unparalleled application performance guarantees, commercial flexibility and a personalized support service from its globally located cloud platfor...
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution and join Akvelon expert and IoT industry leader, Sergey Grebnov, in his session at @ThingsExpo, for an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
SYS-CON Events announced today that DXWorldExpo has been named “Global Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Digital Transformation is the key issue driving the global enterprise IT business. Digital Transformation is most prominent among Global 2000 enterprises and government institutions.
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...
"With Digital Experience Monitoring what used to be a simple visit to a web page has exploded into app on phones, data from social media feeds, competitive benchmarking - these are all components that are only available because of some type of digital asset," explained Leo Vasiliou, Director of Web Performance Engineering at Catchpoint Systems, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"Outscale was founded in 2010, is based in France, is a strategic partner to Dassault Systémes and has done quite a bit of work with divisions of Dassault," explained Jackie Funk, Digital Marketing exec at Outscale, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Kubernetes is an open source system for automating deployment, scaling, and management of containerized applications. Kubernetes was originally built by Google, leveraging years of experience with managing container workloads, and is now a Cloud Native Compute Foundation (CNCF) project. Kubernetes has been widely adopted by the community, supported on all major public and private cloud providers, and is gaining rapid adoption in enterprises. However, Kubernetes may seem intimidating and complex ...
"DivvyCloud as a company set out to help customers automate solutions to the most common cloud problems," noted Jeremy Snyder, VP of Business Development at DivvyCloud, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
While the focus and objectives of IoT initiatives are many and diverse, they all share a few common attributes, and one of those is the network. Commonly, that network includes the Internet, over which there isn't any real control for performance and availability. Or is there? The current state of the art for Big Data analytics, as applied to network telemetry, offers new opportunities for improving and assuring operational integrity. In his session at @ThingsExpo, Jim Frey, Vice President of S...