Click here to close now.

Welcome!

CloudExpo® Blog Authors: Roger Strukhoff, Elizabeth White, Liz McMillan, Bart Copeland, Pat Romanski

Related Topics: CloudExpo® Blog, JAVA IoT, Microservices Expo, Open Source Cloud, Agile Computing, Apache

CloudExpo® Blog: Article

Big Data Security for Apache Hadoop

Ten Tips for HadoopWorld attendees

Big Data takes center stage today at the Strata Conference & Hadoop World in New York, the world’s largest gathering of the Apache Hadoop™ community. A key conversation topic will be how organizations can improve data security for Hadoop and the applications that run on the platform. As you know, Hadoop and similar data stores hold a lot of promise for organizations to finally gain some value out of the immense amount of data they're capturing. But HDFS, Hive and other nascent NoSQL technologies were not necessarily designed with comprehensive security in mind. Often what happens as big data projects grow is sensitive data like HIPAA data, PII and financial records get captured and stored. It's important this data remains secure at rest.

I polled my fellow co-workers at Gazzang last week, and asked them to come up with a top ten list for securing Apache Hadoop. Here's what they delivered. Enjoy:

Think about security before getting started – You don’t wait until after a burglary to put locks on your doors, and you should not wait until after a breach to secure your data. Make sure a serious data security discussion takes place before installing and feeding data into your Hadoop cluster.

Consider what data may get stored – If you are using Hadoop to store and run analytics against regulatory data, you likely need to comply with specific security requirements. If the stored data does not fall under regulatory jurisdiction, keep in mind the risks to your public reputation and potential loss of revenue if data such as personally identifiable information (PII) were breached.

Encrypt data at rest and in motion – Add transparent data encryption at the file layer as a first step toward enhancing the security of a big data project. SSL encryption can protect big data as it moves between nodes and applications.

As Securosis analyst Adrian Lane wrote in a recent blog, “File encryption addresses two attacker methods for circumventing normal application security controls. Encryption protects in case malicious users or administrators gain access to data nodes and directly inspect files, and it also renders stolen files or disk images unreadable. It is transparent to both Hadoop and calling applications and scales out as the cluster grows. This is a cost-effective way to address several data security threats.”

Store the keys away from the encrypted data – Storing encryption keys on the same server as the encrypted data is akin to locking your house and leaving the key in your front door. Instead, use a key management system that separates the key from the encrypted data.

Institute access controls – Establishing and enforcing policies that govern which people and processes can access data stored within Hadoop is essential for keeping rogue users and applications off your cluster.

Require multi-factor authentication - Multi-factor authentication can significantly reduce the likelihood of an account being compromised or access to Hadoop data being granted to an unauthorized party.

Use secure automation – Beyond data encryption, organizations should look to DevOps tools such as Chef or Puppet for automated patch and configuration management.

Frequently audit your environment – Project needs, data sets, cloud requirements and security risks are constantly changing. It’s important to make sure you are closely monitoring your Hadoop environment and performing frequent checks to ensure performance and security goals are being met.

Ask tough questions of your cloud provider – Be sure you know what your cloud provider is responsible for. Will they encrypt your data? Who will store and have access to your keys? How is your data retired when you no longer need it? How do they prevent data leakage?

Centralize accountability – Centralizing the accountability for data security ensures consistent policy enforcement and access control across diverse organizational silos and data sets.

Did we miss anything? If so, please comment below, and enjoy Strata +HadoopWorld.

More Stories By David Tishgart

After spending years at large corporations including Dell, AMD and BMC, David Tishgart joined the startup ranks leading product marketing for Gazzang. Focused on security for big data, he helps communicate the benefits and challenges that big data can present, offering practical solutions. When not ranting about encryption and key management, you can find David clamoring for a big data application that can fine tune his fantasy football team.

@CloudExpo Stories
Cloud Expo, Inc. has announced today that Andi Mann returns to DevOps Summit 2015 as Conference Chair. The 4th International DevOps Summit will take place on June 9-11, 2015, at the Javits Center in New York City. "DevOps is set to be one of the most profound disruptions to hit IT in decades," said Andi Mann. "It is a natural extension of cloud computing, and I have seen both firsthand and in independent research the fantastic results DevOps delivers. So I am excited to help the great team at ...
Explosive growth in connected devices. Enormous amounts of data for collection and analysis. Critical use of data for split-second decision making and actionable information. All three are factors in making the Internet of Things a reality. Yet, any one factor would have an IT organization pondering its infrastructure strategy. How should your organization enhance its IT framework to enable an Internet of Things implementation? In his session at Internet of @ThingsExpo, James Kirkland, Chief Ar...
Container technology is sending shock waves through the world of cloud computing. Heralded as the 'next big thing,' containers provide software owners a consistent way to package their software and dependencies while infrastructure operators benefit from a standard way to deploy and run them. Containers present new challenges for tracking usage due to their dynamic nature. They can also be deployed to bare metal, virtual machines and various cloud platforms. How do software owners track the usag...
CA Technologies has announced it has signed a definitive agreement to acquire Rally Software Development Corp. for $19.50 per share, which equates to approximately $480 million, net of cash acquired. The transaction has been unanimously approved by both Boards of Directors, and is expected to close in the second quarter of CA’s fiscal 2016. Based in Boulder, CO, Rally has approximately 500 employees across four continents and FY 2015 sales of $88 million. “Software applications are changing the...
The security devil is always in the details of the attack: the ones you've endured, the ones you prepare yourself to fend off, and the ones that, you fear, will catch you completely unaware and defenseless. The Internet of Things (IoT) is nothing if not an endless proliferation of details. It's the vision of a world in which continuous Internet connectivity and addressability is embedded into a growing range of human artifacts, into the natural world, and even into our smartphones, appliances, a...
Enterprises are fast realizing the importance of integrating SaaS/Cloud applications, API and on-premises data and processes, to unleash hidden value. This webinar explores how managers can use a Microservice-centric approach to aggressively tackle the unexpected new integration challenges posed by proliferation of cloud, mobile, social and big data projects. Industry analyst and SOA expert Jason Bloomberg will strip away the hype from microservices, and clearly identify their advantages and d...
The OpenStack cloud operating system includes Trove, a database abstraction layer. Rather than applications connecting directly to a specific type of database, they connect to Trove, which in turn connects to one or more specific databases. One target database is Postgres Plus Cloud Database, which includes its own RESTful API. Trove was originally developed around MySQL, whose interfaces are significantly less complicated than those of the Postgres cloud database. In his session at 16th Cloud...
All major researchers estimate there will be tens of billions devices - computers, smartphones, tablets, and sensors - connected to the Internet by 2020. This number will continue to grow at a rapid pace for the next several decades. With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo, June 9-11, 2015, at the Javits Center in New York City. Learn what is going on, contribute to the discussions, and ensure that your enter...
What do a firewall and a fortress have in common? They are no longer strong enough to protect the valuables housed inside. Like the walls of an old fortress, the cracks in the firewall are allowing the bad guys to slip in - unannounced and unnoticed. By the time these thieves get in, the damage is already done and the network is already compromised. Intellectual property is easily slipped out the back door leaving no trace of forced entry. If we want to reign in on these cybercriminals, it's hig...
The 4th International Internet of @ThingsExpo, co-located with the 17th International Cloud Expo - to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA - announces that its Call for Papers is open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
SYS-CON Events announced today that MetraTech, now part of Ericsson, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Ericsson is the driving force behind the Networked Society- a world leader in communications infrastructure, software and services. Some 40% of the world’s mobile traffic runs through networks Ericsson has supplied, serving more than 2.5 billion subscribers.
Software Development Solution category in The 2015 American Business Awards, and will ultimately be a Gold, Silver, or Bronze Stevie® Award winner in the program. More than 3,300 nominations from organizations of all sizes and in virtually every industry were submitted this year for consideration. "We are honored to be recognized as a leader in the software development industry by the Stevie Awards judges," said Steve Brodie, CEO of Electric Cloud. "We introduced ElectricFlow and our Deploy app...
SYS-CON Events announced today that O'Reilly Media has been named “Media Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York City, NY. O'Reilly Media spreads the knowledge of innovators through its books, online services, magazines, and conferences. Since 1978, O'Reilly Media has been a chronicler and catalyst of cutting-edge development, homing in on the technology trends that really matter and spurring their adoption...
In their general session at 16th Cloud Expo, Michael Piccininni, Global Account Manager – Cloud SP at EMC Corporation, and Mike Dietze, Regional Director at Windstream Hosted Solutions, will review next generation cloud services, including the Windstream-EMC Tier Storage solutions, and discuss how to increase efficiencies, improve service delivery and enhance corporate cloud solution development. Speaker Bios Michael Piccininni is Global Account Manager – Cloud SP at EMC Corporation. He has b...
There will be 150 billion connected devices by 2020. New digital businesses have already disrupted value chains across every industry. APIs are at the center of the digital business. You need to understand what assets you have that can be exposed digitally, what their digital value chain is, and how to create an effective business model around that value chain to compete in this economy. No enterprise can be complacent and not engage in the digital economy. Learn how to be the disruptor and not ...
SYS-CON Events announced today that EnterpriseDB (EDB), the leading worldwide provider of enterprise-class Postgres products and database compatibility solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. EDB is the largest provider of Postgres software and services that provides enterprise-class performance and scalability and the open source freedom to divert budget from more costly traditiona...
Fundamentally, SDN is still mostly about network plumbing. While plumbing may be useful to tinker with, what you can do with your plumbing is far more intriguing. A rigid interpretation of SDN confines it to Layers 2 and 3, and that's reasonable. But SDN opens opportunities for novel constructions in Layers 4 to 7 that solve real operational problems in data centers. "Data center," in fact, might become anachronistic - data is everywhere, constantly on the move, seemingly always overflowing. Net...
Mobile commerce traffic is surpassing desktop, yet less than 20% of sales in the U.S. are mobile commerce sales. In his session at 15th Cloud Expo, Dan Franklin, Segment Manager, Commerce, at Verizon Digital Media Services, defined mobile devices and discussed how next generation means simplification. It means taking your digital content and turning it into instantly gratifying experiences.
The most often asked question post-DevOps introduction is: “How do I get started?” There’s plenty of information on why DevOps is valid and important, but many managers still struggle with simple basics for how to initiate a DevOps program in their business. They struggle with issues related to current organizational inertia, the lack of experience on Continuous Integration/Delivery, understanding where DevOps will affect revenue and budget, etc. In their session at DevOps Summit, JP Morgentha...
While there are hundreds of public and private cloud hosting providers to choose from, not all clouds are created equal. If you’re seeking to host enterprise-level mission-critical applications, where Cloud Security is a primary concern, WHOA.com is setting new standards for cloud hosting, and has established itself as a major contender in the marketplace. We are constantly seeking ways to innovate and leverage state-of-the-art technologies. In his session at 16th Cloud Expo, Mike Rivera, Seni...