Welcome!

@CloudExpo Authors: Pat Romanski, Yeshim Deniz, William Schmarzo, Elizabeth White, Liz McMillan

Related Topics: @CloudExpo, @BigDataExpo, @ThingsExpo

@CloudExpo: Blog Feed Post

Election #DataScience and the Death of Truth | @CloudExpo #BigData #Analytics

Many candidates treated opinions as ‘truth’ and a large portion of the American public grabbed ahold of these ‘truths’ as gospel

The U.S. Presidential election is finally over. The protests are winding down, they’ve stopped burning cars in Oakland (for now), and the talks of California succession are waning. But I am struggling to return to “normal” because in this election, truth got hammered.

Many candidates treated opinions as “truth” and a large portion of the American public grabbed a hold of these “truths” as gospel. It may have been a good time to be in the “fact checking” business, but I’m not sure how effective even the fact checkers could be given the spontaneous nature of “opinions as facts” being thrown around, not to mention the people who create fake news intentionally.

So let’s play a game! Let’s call this game “Separate the Truth from the Myths.” Let’s see how you do.

  1. Bat Boy Sighted in NYC Subway (probably too expensive to get a condo in Manhattan)
  2. Obama Appoints Martian Ambassador (but the Senate will request Matt Damon since he’s already lived and farmed on Mars)
  3. Skynet is a Reality (Hey, even Iron Man showed up at the Senate to tell them so!)
  4. Ted Cruz Shot JFK (okay, so it actually was his dad, but accusing Ted Cruz is more funny)

All but one of these stories appeared in the highly credible “National Enquirer” or “Weekly World News.” That’s like buying a copy of the “Mad Magazine” (for you old timers) or reading “The Onion” (for you young whippersnappers) expecting the “truth” from these satirical publications (see Figure 1).

Figure 1: Real Headlines from “Less Than Credible” Sources

However the below stories in Figure 2 where plastered across social media sites as if they were the truth, and as you can see from the engagement numbers, lots of people took the time to read these “truths.”

Figure 2: Social Media Fake News and Number of Views

Data Science And Common Sense
As a data scientist, we need to know not to accept the “truth” without applying some common sense. For all the fancy training in neural networks, artificial intelligence and machine learning, it’s hard to replace “common sense” as a necessary data scientist characteristic. Let’s walk through an example of how a data scientist might approach one of the sensational stories that recently popped up on social media (see Figure 3).

Figure 3: The Guardian, September 26, 2016

OMG, murders are up 10.8% in the biggest percentage increase since 1971, according to a highly credible source like the FBI. It’s become the “Walking Dead” out there!

Sensational headlines grab attention and incite fear and dread. “Dirty Laundry” sells. But the problem with data at the aggregate level is that it:

  • Distorts the real truth (or root cause) of what’s the problem, and
  • It is not actionable

The above headline could lead to the conclusion that the current criminal and rehabilitation policies have failed and everything should be thrown out. But there are no details as to what aspects of these programs are broken and no triage of the root causes in order to explore what might be done to fix the problem. As a data scientist, one must demand the granular details so that we can turn the data into insights in order to make the information actionable, such as:

This is a good starting point. If we want to address the increase in murders, we need to drill into each individual murder (and attempted murder) in those 10 cities. We need to keep drilling into the granular details in order to identify those variables and metrics that might be predictors of murders and attempted murders.

For example, we could identify the specific blocks of these cities where the murders are occurring, or the time of day and day of week, or the time of the year, or any special events that occurred right before the murders, etc. We could explore other variables that might be indicative of an increase in murder (e.g., % of broken homes, % of children born out of wedlock, % of high school dropouts, % of drug addicts, unemployment rate among male adults, increase in graffiti).

Once we know those variables that are predictive of murders, then we have a focus as to where we can start fixing the problem, taking corrective actions such as adding more police or community outreach, reducing high school dropouts, increasing drug arrests, testing different programs and approaches, measuring program effectiveness, learning and improving. Now that’s thinking like a data scientist.

Data Scientist Lessons Learned
What are the lessons that we can take away from this “opinions as facts” syndrome?

  • Common sense is critical. Don’t accept “truths” at face value. Demand more details in order to identify and quantify those variables and metrics that might be predictive or indicative of the researched problem.
  • You can’t fix the business – or the country – without drilling into the details and the potential causal factors. We need insights that are drawn from facts that are supported by granular data so that we know what actions to take. With these detailed insights in hand, we now know where to invest our scarce financial and human resources.
  • Details matter. At the aggregate level, the headlines may be sensational, but it is not insightful or actionable until you get into the details. Remember Simpson’s Paradox.
  • Data quality, accuracy and reasonableness are important, especially if you are trying to make business-impactful decisions based upon that data. Business users, if they are expected to use the data to support decisions, must have confidence in the data. “Facts as Facts” are critical if we want to overcome decisions being made on a traditional basis such as gut, hearsay and history.

The good data scientist learns not to trust anything at first blush; that while opinions might yield variables and metrics that might be better predictors of performance, in the end the data scientists need to validate each of these variables and metrics to quantify if they really are better predictors of performance.

In the movie “Star Wars: The New Hope," the weak-minded Storm Troopers were easily dissuaded from pursuing the truth about the droids by Obi-Wan Kenobi’s use of the Jedi Mind Trick to plant the “truth” in their weak minds.

Don’t be weak-minded about seeking the truth. Use your common sense to challenge the “truth,” and get into the granular details so that one can identify and quantify those variables and metrics that are better predictor or indicators of the problems.

And beware the “These aren’t the Droids you’re looking for” syndrome. That’s for the weak-minded.

The post Election Data Science and the Death of Truth appeared first on InFocus Blog | Dell EMC Services.

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.

@CloudExpo Stories
SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.
Regardless of what business you’re in, it’s increasingly a software-driven business. Consumers’ rising expectations for connected digital and physical experiences are driving what some are calling the "Customer Experience Challenge.” In his session at @DevOpsSummit at 20th Cloud Expo, Marco Morales, Director of Global Solutions at CollabNet, will discuss how organizations are increasingly adopting a discipline of Value Stream Mapping to ensure that the software they are producing is poised to o...
IBM helps FinTechs and financial services companies build and monetize cognitive-enabled financial services apps quickly and at scale. Hosted on IBM Bluemix, IBM’s platform builds in customer insights, regulatory compliance analytics and security to help reduce development time and testing. In his session at 20th Cloud Expo, Tom Eck, Industry Platforms CTO at IBM Cloud, will discuss how these tools simplify the time-consuming tasks of selection, mapping and data integration, allowing developers ...
SYS-CON Events announced today that Outscale, a global pure play Infrastructure as a Service provider and strategic partner of Dassault Systèmes, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2010, Outscale simplifies infrastructure complexities and boosts the business agility of its customers. Outscale delivers a secure, reliable and industrial strength solution for its customers, which in...
SYS-CON Events announced today that Outscale will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Outscale's technology makes an automated and adaptable Cloud available to businesses, supporting them in the most complex IT projects while controlling their operational aspects. You boost your IT infrastructure's reactivity, with request responses that only take a few seconds.
SYS-CON Events announced today that Systena America will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Systena Group has been in business for various software development and verification in Japan, US, ASEAN, and China by utilizing the knowledge we gained from all types of device development for various industries including smartphones (Android/iOS), wireless communication, security technology and IoT serv...
DevOps at Cloud Expo – being held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real r...
Cloud applications are seeing a deluge of requests to support the exploding advanced analytics market. “Open analytics” is the emerging strategy to deliver that data through an open data access layer, in the cloud, to be directly consumed by external analytics tools and popular programming languages. An increasing number of data engineers and data scientists use a variety of platforms and advanced analytics languages such as SAS, R, Python and Java, as well as frameworks such as Hadoop and Spark...
Cloud promises the agility required by today’s digital businesses. As organizations adopt cloud based infrastructures and services, their IT resources become increasingly dynamic and hybrid in nature. Managing these require modern IT operations and tools. In his session at 20th Cloud Expo, Raj Sundaram, Senior Principal Product Manager at CA Technologies, will discuss how to modernize your IT operations in order to proactively manage your hybrid cloud and IT environments. He will be sharing bes...
Interested in leveling up on your Cloud Foundry skills? Join IBM for Cloud Foundry Days on June 7 at Cloud Expo New York at the Javits Center in New York City. Cloud Foundry Days is a free half day educational conference and networking event. Come find out why Cloud Foundry is the industry's fastest-growing and most adopted cloud application platform.
Five years ago development was seen as a dead-end career, now it’s anything but – with an explosion in mobile and IoT initiatives increasing the demand for skilled engineers. But apart from having a ready supply of great coders, what constitutes true ‘DevOps Royalty’? It’ll be the ability to craft resilient architectures, supportability, security everywhere across the software lifecycle. In his keynote at @DevOpsSummit at 20th Cloud Expo, Jeffrey Scheaffer, GM and SVP, Continuous Delivery Busine...
In order to meet the rapidly changing demands of today’s customers, companies are continually forced to redefine their business strategies in order to meet these needs, stay relevant and continue to see profitable growth. IoT deployment and development is integral in this transformation, and today businesses are increasingly seeing the value of investing their resources into IoT deployments. These technologies are able increase ROI through projects such as connecting supply chains or enabling sm...
SYS-CON Events announced today that Twistlock, the leading provider of cloud container security solutions, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Twistlock is the industry's first enterprise security suite for container security. Twistlock's technology addresses risks on the host and within the application of the container, enabling enterprises to consistently enforce security policies, monitor...
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
SYS-CON Events announced today that Peak 10, Inc., a national IT infrastructure and cloud services provider, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Peak 10 provides reliable, tailored data center and network services, cloud and managed services. Its solutions are designed to scale and adapt to customers’ changing business needs, enabling them to lower costs, improve performance and focus intern...
Everywhere we turn in our industry we can find strong opinions about the direction, type and nature of cloud’s impact on computing and business. Another word that is used in every context in our industry is “hybrid.” In his session at 20th Cloud Expo, Alvaro Gonzalez, Director of Technical, Partner and Field Marketing at Peak 10, will use a combination of a few conceptual props and some research recently commissioned by Peak 10 to offer a real-world consideration of how the various categories of...
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore t...
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in compute, storage and networking technologies, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/...
This talk centers around how to automate best practices in a multi-/hybrid-cloud world based on our work with customers like GE, Discovery Communications and Fannie Mae. Today’s enterprises are reaping the benefits of cloud computing, but also discovering many risks and challenges. In the age of DevOps and the decentralization of IT, it’s easy to over-provision resources, forget that instances are running, or unintentionally expose vulnerabilities.