Welcome!

@CloudExpo Authors: Yeshim Deniz, Elizabeth White, Pat Romanski, Liz McMillan, Zakia Bouachraoui

Related Topics: @CloudExpo, @DXWorldExpo

@CloudExpo: Article

Big Data and Privacy

Why data privacy might be the next big thing in big data

Remember when being “sent to your room” was considered one the harshest punishments a parent could dole out?

I certainly hated it, and I’m pretty sure my kids don’t like it much either. For whatever reason, this form of punishment – the ultimate act of isolation – seems to have stood the test of time. It’s also a great way to quickly introduce your children to the seven stages of grief:

  1. Stunned silence
  2. Screaming
  3. Kicking
  4. Pouting
  5. Kicking again
  6. Wailing
  7. Acceptance



But in today’s hyper-connected, always-on society, the whole notion of isolation is quickly becoming obsolete. So much so that forcing your child to spend time alone with their thoughts on a bed full of stuffed animals might be the last time they experience true and absolute privacy.

That organizations can tell whether or not someone is pregnant based on their buying habits is well-covered territory. In the U.S., the idea that organizations are always watching and learning from their customers is now just a part of life. In fact, in many cases, it’s celebrated. Run a search on Forbes.com for Big Data, and look at the headlines: You’ll see President Obama loves it. So apparently does Santa.

But Europe is a different beast. Protecting ones right to privacy – their “right to be forgotten” – is the subject of much scrutiny and handwringing as the EU seeks to more clearly define its data protection laws.

At issue is how long, and for what purpose a company can retain user data, what constitutes identifiable data and when and how that data must be “forgotten.”

An interesting article by GigaOm senior writer, David Meyer makes the case that big data makes it possible for analytics run against aggregated customer data to potentially be reverse engineered to reveal potentially identifiable information.

The process of sifting through and analyzing structured and unstructured data to gain new insights about customers is hardly new. Big data represents the evolution of the technology that allows you to perform these tasks in real time across massively distributed platforms and retain the data far longer.

And the longer the data is retained, the greater the risk of private or personally identifiable information (PII) – names, social security numbers, addresses, driver’s license numbers – being leaked or stolen. Some law firms are already gearing up for class action suits related to breached big data.

It will be interesting to see how the conversation of big data and privacy plays out. We’ll certainly be following it closely. As always, our advice to organizations spinning up big data projects is to think about data privacy from the beginning and encrypt everything.

More Stories By David Tishgart

David Tishgart is a Director of Product Marketing at Cloudera, focused on the company's cloud products, strategy, and partnerships. Prior to joining Cloudera, he ran business development and marketing at Gazzang, an enterprise security software company that was eventually acquired by Cloudera. He brings nearly two decades of experience in enterprise software, hardware, and services marketing to Cloudera. He holds a bachelor's degree in journalism from the University of Texas at Austin.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


CloudEXPO Stories
Andi Mann, Chief Technology Advocate at Splunk, is an accomplished digital business executive with extensive global expertise as a strategist, technologist, innovator, marketer, and communicator. For over 30 years across five continents, he has built success with Fortune 500 corporations, vendors, governments, and as a leading research analyst and consultant.
Most DevOps journeys involve several phases of maturity. Research shows that the inflection point where organizations begin to see maximum value is when they implement tight integration deploying their code to their infrastructure. Success at this level is the last barrier to at-will deployment. Storage, for instance, is more capable than where we read and write data. In his session at @DevOpsSummit at 20th Cloud Expo, Josh Atwell, a Developer Advocate for NetApp, will discuss the role and value extensible storage infrastructure has in accelerating software development activities, improve code quality, reveal multiple deployment options through automated testing, and support continuous integration efforts. All this will be described using tools common in DevOps organizations.
"When you think about the data center today, there's constant evolution, The evolution of the data center and the needs of the consumer of technology change, and they change constantly," stated Matt Kalmenson, VP of Sales, Service and Cloud Providers at Veeam Software, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like "How is my application doing" but no idea how to get a proper answer.
Today, we have more data to manage than ever. We also have better algorithms that help us access our data faster. Cloud is the driving force behind many of the data warehouse advancements we have enjoyed in recent years. But what are the best practices for storing data in the cloud for machine learning and data science applications?