Welcome!

Cloud Expo Authors: Pat Romanski, Keith Mayer, Liz McMillan, Sebastian Kruk, Jason Bloomberg

Related Topics: Cloud Expo, Java, SOA & WOA, Open Source, Web 2.0, Apache

Cloud Expo: Article

Big Data Security for Apache Hadoop

Ten Tips for HadoopWorld attendees

Big Data takes center stage today at the Strata Conference & Hadoop World in New York, the world’s largest gathering of the Apache Hadoop™ community. A key conversation topic will be how organizations can improve data security for Hadoop and the applications that run on the platform. As you know, Hadoop and similar data stores hold a lot of promise for organizations to finally gain some value out of the immense amount of data they're capturing. But HDFS, Hive and other nascent NoSQL technologies were not necessarily designed with comprehensive security in mind. Often what happens as big data projects grow is sensitive data like HIPAA data, PII and financial records get captured and stored. It's important this data remains secure at rest.

I polled my fellow co-workers at Gazzang last week, and asked them to come up with a top ten list for securing Apache Hadoop. Here's what they delivered. Enjoy:

Think about security before getting started – You don’t wait until after a burglary to put locks on your doors, and you should not wait until after a breach to secure your data. Make sure a serious data security discussion takes place before installing and feeding data into your Hadoop cluster.

Consider what data may get stored – If you are using Hadoop to store and run analytics against regulatory data, you likely need to comply with specific security requirements. If the stored data does not fall under regulatory jurisdiction, keep in mind the risks to your public reputation and potential loss of revenue if data such as personally identifiable information (PII) were breached.

Encrypt data at rest and in motion – Add transparent data encryption at the file layer as a first step toward enhancing the security of a big data project. SSL encryption can protect big data as it moves between nodes and applications.

As Securosis analyst Adrian Lane wrote in a recent blog, “File encryption addresses two attacker methods for circumventing normal application security controls. Encryption protects in case malicious users or administrators gain access to data nodes and directly inspect files, and it also renders stolen files or disk images unreadable. It is transparent to both Hadoop and calling applications and scales out as the cluster grows. This is a cost-effective way to address several data security threats.”

Store the keys away from the encrypted data – Storing encryption keys on the same server as the encrypted data is akin to locking your house and leaving the key in your front door. Instead, use a key management system that separates the key from the encrypted data.

Institute access controls – Establishing and enforcing policies that govern which people and processes can access data stored within Hadoop is essential for keeping rogue users and applications off your cluster.

Require multi-factor authentication - Multi-factor authentication can significantly reduce the likelihood of an account being compromised or access to Hadoop data being granted to an unauthorized party.

Use secure automation – Beyond data encryption, organizations should look to DevOps tools such as Chef or Puppet for automated patch and configuration management.

Frequently audit your environment – Project needs, data sets, cloud requirements and security risks are constantly changing. It’s important to make sure you are closely monitoring your Hadoop environment and performing frequent checks to ensure performance and security goals are being met.

Ask tough questions of your cloud provider – Be sure you know what your cloud provider is responsible for. Will they encrypt your data? Who will store and have access to your keys? How is your data retired when you no longer need it? How do they prevent data leakage?

Centralize accountability – Centralizing the accountability for data security ensures consistent policy enforcement and access control across diverse organizational silos and data sets.

Did we miss anything? If so, please comment below, and enjoy Strata +HadoopWorld.

More Stories By David Tishgart

After spending years at large corporations including Dell, AMD and BMC, David Tishgart joined the startup ranks leading product marketing for Gazzang. Focused on security for big data, he helps communicate the benefits and challenges that big data can present, offering practical solutions. When not ranting about encryption and key management, you can find David clamoring for a big data application that can fine tune his fantasy football team.

Cloud Expo Breaking News
“Open source has always provided a number of benefits, including easing adoption costs, propagating a better understanding of the technology, and allowing for faster evolution and commercialization of products and services based on it,” noted Terry Woloszyn, Founder & CEO, Leeward Security Ltd., in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “This is clearly evident with the OpenStack and CloudStack,” Woloszyn continued, “and others that have been quickly commercialized as...
SYS-CON Events announced today that OpenStack will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York. OpenStack software controls large pools of compute, storage, and networking resources throughout a datacenter, all managed by a dashboard that gives administrators control while empowering their users to provision resources through a web interface. OpenStack powers some of the most widely-used SaaS app...
SYS-CON Events announced today that Wowrack will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York. Wowrack’s core expertise lies in high-availability Private and Public Cloud IaaS Hosting Solutions. Wowrack provides a true Hybrid service – where business release all IT management and hardware provisioning – taking the data center and server system administrative headaches off our customer’s shoulders. ...
Many have heard of OAuth but are unsure of how it might apply to their business. In his session at the 12th International Cloud Expo, Alistair Farquharson, CTO of SOA Software, will describe how OAuth can be used to facilitate certain business models and simplify the sharing of private data. Alistair Farquharson is a visionary industry veteran focused on using disruptive technologies to drive business growth and improve efficiency and agility within organizations. As the CTO of SOA Software A...
“Cloud has everything to do with what has happened with Big Data,” explained Jason Deck, Director of Strategic Alliances at Logicworks, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “Big Data doesn’t exist in its easily accessible way without cloud. From reduced startup costs, to cheap storage, to fast processing, to adequate security, to the easy incorporation of third-party analytics tools, cloud made Big Data accessible to customers of all sizes, with all different bud...
SYS-CON Events announced today that nfina Technologies, a provider of highly reliable cloud server products, will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York. nfina Technologies develops, manufactures, and markets highly reliable cloud server products, designed to solve the most demanding data center requirements in mission-critical cloud applications. Nfina’s staff has decades of experience in co...
“Social, mobile, analytics and cloud can’t be looked at as distinct technology trends; they are facets of the same movement and an everyday reality for consumers and businesses alike,” said Craig Sowell, IBM VP of SmartCloud Marketing, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “This means that businesses need to start looking at trends as one: cloud is the delivery, analytics is the unique insight, social is a shareable service, and mobile is the ubiquitous access.” ...
In his session at the 12th International Cloud Expo, Dave Eichorn, Global Data Center Practice Head at Zensar, will share a case study describing how a utility services company handled the migration of its Microsoft platform to the cloud. Challenged with the time-consuming task of opening operations out of temporary offices, this company struggled with the need to simultaneously access data that was accumulated from a vast amount of data-intensive jobs. Zensar migrated the company’s application ...
Organizations across the world are increasingly starting to see the benefits of moving more and more services to the cloud. The focus on the cost-saving potential of cloud is rapidly shifting to completely transforming the business with cloud. As organizations are investing enormous sums on technology they are starting to realize that in order to maximize the return on investment and accelerate the business transformation process the first area of focus should be people. By ensuring the organiza...
You're getting pitched every day from your legacy enterprise software and hardware vendors about "cloud." They're doing an amazing job of convincing your CIO and CTO about what cloud is and how you should use it. The reality is they're defending their shrinking market share and keeping you on the legacy treadmill for as long as they can by selling you solutions that aren't "cloud." In her session at the 12th International Cloud Expo, Niki Acosta, Cloud Evangelista for Rackspace, will talk thro...