Click here to close now.


@CloudExpo Authors: Anil Rao, Philippe Abdoulaye, Steve Watts, Xenia von Wedel, Carmen Gonzalez

Related Topics: @CloudExpo, Java IoT, Microservices Expo, Open Source Cloud, Agile Computing, Apache

@CloudExpo: Article

Big Data Security for Apache Hadoop

Ten Tips for HadoopWorld attendees

Big Data takes center stage today at the Strata Conference & Hadoop World in New York, the world’s largest gathering of the Apache Hadoop™ community. A key conversation topic will be how organizations can improve data security for Hadoop and the applications that run on the platform. As you know, Hadoop and similar data stores hold a lot of promise for organizations to finally gain some value out of the immense amount of data they're capturing. But HDFS, Hive and other nascent NoSQL technologies were not necessarily designed with comprehensive security in mind. Often what happens as big data projects grow is sensitive data like HIPAA data, PII and financial records get captured and stored. It's important this data remains secure at rest.

I polled my fellow co-workers at Gazzang last week, and asked them to come up with a top ten list for securing Apache Hadoop. Here's what they delivered. Enjoy:

Think about security before getting started – You don’t wait until after a burglary to put locks on your doors, and you should not wait until after a breach to secure your data. Make sure a serious data security discussion takes place before installing and feeding data into your Hadoop cluster.

Consider what data may get stored – If you are using Hadoop to store and run analytics against regulatory data, you likely need to comply with specific security requirements. If the stored data does not fall under regulatory jurisdiction, keep in mind the risks to your public reputation and potential loss of revenue if data such as personally identifiable information (PII) were breached.

Encrypt data at rest and in motion – Add transparent data encryption at the file layer as a first step toward enhancing the security of a big data project. SSL encryption can protect big data as it moves between nodes and applications.

As Securosis analyst Adrian Lane wrote in a recent blog, “File encryption addresses two attacker methods for circumventing normal application security controls. Encryption protects in case malicious users or administrators gain access to data nodes and directly inspect files, and it also renders stolen files or disk images unreadable. It is transparent to both Hadoop and calling applications and scales out as the cluster grows. This is a cost-effective way to address several data security threats.”

Store the keys away from the encrypted data – Storing encryption keys on the same server as the encrypted data is akin to locking your house and leaving the key in your front door. Instead, use a key management system that separates the key from the encrypted data.

Institute access controls – Establishing and enforcing policies that govern which people and processes can access data stored within Hadoop is essential for keeping rogue users and applications off your cluster.

Require multi-factor authentication - Multi-factor authentication can significantly reduce the likelihood of an account being compromised or access to Hadoop data being granted to an unauthorized party.

Use secure automation – Beyond data encryption, organizations should look to DevOps tools such as Chef or Puppet for automated patch and configuration management.

Frequently audit your environment – Project needs, data sets, cloud requirements and security risks are constantly changing. It’s important to make sure you are closely monitoring your Hadoop environment and performing frequent checks to ensure performance and security goals are being met.

Ask tough questions of your cloud provider – Be sure you know what your cloud provider is responsible for. Will they encrypt your data? Who will store and have access to your keys? How is your data retired when you no longer need it? How do they prevent data leakage?

Centralize accountability – Centralizing the accountability for data security ensures consistent policy enforcement and access control across diverse organizational silos and data sets.

Did we miss anything? If so, please comment below, and enjoy Strata +HadoopWorld.

More Stories By David Tishgart

After spending years at large corporations including Dell, AMD and BMC, David Tishgart joined the startup ranks leading product marketing for Gazzang. Focused on security for big data, he helps communicate the benefits and challenges that big data can present, offering practical solutions. When not ranting about encryption and key management, you can find David clamoring for a big data application that can fine tune his fantasy football team.

@CloudExpo Stories
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York and Silicon Valley. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty ...
Internet of @ThingsExpo, taking place June 7-9, 2016 at Javits Center, New York City and Nov 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with the 18th International @CloudExpo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world and ThingsExpo New York Call for Papers is now open.
Cloud computing delivers on-demand resources that provide businesses with flexibility and cost-savings. The challenge in moving workloads to the cloud has been the cost and complexity of ensuring the initial and ongoing security and regulatory (PCI, HIPAA, FFIEC) compliance across private and public clouds. Manual security compliance is slow, prone to human error, and represents over 50% of the cost of managing cloud applications. Determining how to automate cloud security compliance is critical...
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Day 2 Keynote at 17th Cloud Expo, San...
As organizations shift towards IT-as-a-service models, the need for managing & protecting data residing across physical, virtual, and now cloud environments grows with it. CommVault can ensure protection & E-Discovery of your data - whether in a private cloud, a Service Provider delivered public cloud, or a hybrid cloud environment – across the heterogeneous enterprise.
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
In recent years, at least 40% of companies using cloud applications have experienced data loss. One of the best prevention against cloud data loss is backing up your cloud data. In his General Session at 17th Cloud Expo, Sam McIntyre, Partner Enablement Specialist at eFolder, presented how organizations can use eFolder Cloudfinder to automate backups of cloud application data. He also demonstrated how easy it is to search and restore cloud application data using Cloudfinder.
In his General Session at DevOps Summit, Asaf Yigal, Co-Founder & VP of Product at, explored the value of Kibana 4 for log analysis and provided a hands-on tutorial on how to set up Kibana 4 and get the most out of Apache log files. He examined three use cases: IT operations, business intelligence, and security and compliance. Asaf Yigal is co-founder and VP of Product at log analytics software company In the past, he was co-founder of social-trading platform Currensee, which...
Today air travel is a minefield of delays, hassles and customer disappointment. Airlines struggle to revitalize the experience. GE and M2Mi will demonstrate practical examples of how IoT solutions are helping airlines bring back personalization, reduce trip time and improve reliability. In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect with GE, and Dr. Sarah Cooper, M2Mi’s VP Business Development and Engineering, explored the IoT cloud-based platform technologies driving t...
There are over 120 breakout sessions in all, with Keynotes, General Sessions, and Power Panels adding to three days of incredibly rich presentations and content. Join @ThingsExpo conference chair Roger Strukhoff (@IoT2040), June 7-9, 2016 in New York City, for three days of intense 'Internet of Things' discussion and focus, including Big Data's indespensable role in IoT, Smart Grids and Industrial Internet of Things, Wearables and Consumer IoT, as well as (new) IoT's use in Vertical Markets.
The Internet of Things (IoT) is growing rapidly by extending current technologies, products and networks. By 2020, Cisco estimates there will be 50 billion connected devices. Gartner has forecast revenues of over $300 billion, just to IoT suppliers. Now is the time to figure out how you’ll make money – not just create innovative products. With hundreds of new products and companies jumping into the IoT fray every month, there’s no shortage of innovation. Despite this, McKinsey/VisionMobile data...
We all know that data growth is exploding and storage budgets are shrinking. Instead of showing you charts on about how much data there is, in his General Session at 17th Cloud Expo, Scott Cleland, Senior Director of Product Marketing at HGST, showed how to capture all of your data in one place. After you have your data under control, you can then analyze it in one place, saving time and resources.
SYS-CON Events announced today that Alert Logic, Inc., the leading provider of Security-as-a-Service solutions for the cloud, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Alert Logic, Inc., provides Security-as-a-Service for on-premises, cloud, and hybrid infrastructures, delivering deep security insight and continuous protection for customers at a lower cost than traditional security solutions. Ful...
Just over a week ago I received a long and loud sustained applause for a presentation I delivered at this year’s Cloud Expo in Santa Clara. I was extremely pleased with the turnout and had some very good conversations with many of the attendees. Over the next few days I had many more meaningful conversations and was not only happy with the results but also learned a few new things. Here is everything I learned in those three days distilled into three short points.
As organizations realize the scope of the Internet of Things, gaining key insights from Big Data, through the use of advanced analytics, becomes crucial. However, IoT also creates the need for petabyte scale storage of data from millions of devices. A new type of Storage is required which seamlessly integrates robust data analytics with massive scale. These storage systems will act as “smart systems” provide in-place analytics that speed discovery and enable businesses to quickly derive meaningf...
DevOps is about increasing efficiency, but nothing is more inefficient than building the same application twice. However, this is a routine occurrence with enterprise applications that need both a rich desktop web interface and strong mobile support. With recent technological advances from Isomorphic Software and others, rich desktop and tuned mobile experiences can now be created with a single codebase – without compromising functionality, performance or usability. In his session at DevOps Su...
In his General Session at 17th Cloud Expo, Bruce Swann, Senior Product Marketing Manager for Adobe Campaign, explored the key ingredients of cross-channel marketing in a digital world. Learn how the Adobe Marketing Cloud can help marketers embrace opportunities for personalized, relevant and real-time customer engagement across offline (direct mail, point of sale, call center) and digital (email, website, SMS, mobile apps, social networks, connected objects).
The buzz continues for cloud, data analytics and the Internet of Things (IoT) and their collective impact across all industries. But a new conversation is emerging - how do companies use industry disruption and technology enablers to lead in markets undergoing change, uncertainty and ambiguity? Organizations of all sizes need to evolve and transform, often under massive pressure, as industry lines blur and merge and traditional business models are assaulted and turned upside down. In this new da...
The Internet of Everything is re-shaping technology trends–moving away from “request/response” architecture to an “always-on” Streaming Web where data is in constant motion and secure, reliable communication is an absolute necessity. As more and more THINGS go online, the challenges that developers will need to address will only increase exponentially. In his session at @ThingsExpo, Todd Greene, Founder & CEO of PubNub, exploreed the current state of IoT connectivity and review key trends and t...
Too often with compelling new technologies market participants become overly enamored with that attractiveness of the technology and neglect underlying business drivers. This tendency, what some call the “newest shiny object syndrome” is understandable given that virtually all of us are heavily engaged in technology. But it is also mistaken. Without concrete business cases driving its deployment, IoT, like many other technologies before it, will fade into obscurity.