Welcome!

@CloudExpo Authors: Yeshim Deniz, Elizabeth White, William Schmarzo, Kevin Jackson, Liz McMillan

Related Topics: @CloudExpo

@CloudExpo: Blog Feed Post

Object Storage for Big Unstructured Data

There are two kinds of Big Data: Big Data (for analytics) and Big Unstructured Data

Big Data is Big, but it also causes a lot of confusion. Big Data is used for anything related storage these days, so people don’t know anymore what it exactly is. Is it Hadoop? Is it analytics? It doesn’t need to be that complicated though. There are two kinds of Big Data: Big Data (for analytics) and Big Unstructured Data.

Big Data for analytics is a paradigm that became popular in the previous decade. A lot of innovation was done for research projects. New technology enabled researchers in many different domains to capture data in a way they had never been able to do before. In agriculture, for example, ploughs would get sensors that would send little bits of information to a central system (over satellite). Every couple of feet these sensors would measure what’s in the ground (minerals for example), how humid the ground is etc. Based on that, large agriculture companies would then be able to make better decisions on where to grow which crop.

The problem was that traditional systems to store this massive amount of small data (relational databases) were no longer adequate to store this information. Systems like MapReduce and Hadoop were created as an alternative and would store these massive volumes of files as concatenated “Big” files. Big Data was born, Big Data for semi-structured data.

Today we are seeing a similar trend with unstructured data. Studies show that data storage requirements will grow with a factor 30 over the next decade. 80% of that data are large files: office documents, movies, music, pictures. Similar to how the databases in the previous decade, traditional storage – file systems – is not the best way to store this data. File systems will not scale sufficiently and actually become obsolete as applications will take over the role of the file system.

A nice example is what Google Picasa does for us: in the old days we would store pictures nicely organized in a file system (hopefully with some backups). One folder per year, one per month in each year, one per holiday or party. Today, we just dump all the pictures in one folder and Picasa will sort them for us based on date, location, face recognition (!) or other metadata. With an intelligent query, we can display the right pictures very fast, much faster than browsing the file system. We don’t even have to worry about backups as we can store copies in the cloud automatically.

The new paradigm that will help us store these massive amounts of unstructured data is Object Storage. Object Storage systems are uniformly scalable pools of storage that are accessible through a REST interface. Files – objects – are dumped into the pool and an identifier is kept to locate the object when it is needed. Applications that are designed to run on top of object storage will use these identifiers through the REST protocol. A good analogy is parking your car Valet vs. self park. When you self park you have to remember the lot, the floor, the isle etc (file system); with Valet you get a receipt when you give your keys and you will later use that receipt to get your car back.

So what is needed to build an object storage system? Basically just lots of disks, a REST API and a way to provide durability. This could be done with traditional systems like RAID but the problem is that RAID requires a huge amount of overhead to provide acceptable availability. The more data we store, the more painful it is to be needing 200% overhead as some systems do. The smarter way to provide durability for object storage is erasure encoding.

Erasure encoding stores objects as equations, which are spread over the entire storage pool: Data objects are split up in sub-blocks, from which equations are calculated. According to the availability policy, an overhead of equations is calculated and the equations are spread over as many disks are possible, also policy-defined. As a result, when a disk breaks, the system will always have sufficient equations to restore the original data block. If a disk is broken, the system can re-calculate equations as a background task to bring the number of available equations on a healthy level again. A pioneer of this technology is Amplidata, who use low power Atom processors in their hardware to reduce power costs. As the entire system, all storage nodes, can recalculate missing equations as a background task, Amplidata figured out it was not necessary to use the high-end nodes that RAID systems need (to speed up restores and avoid performance losses).

Apart from providing a more efficient and a more scalable way to store data, erasure coding based object storage can save up to 70% on the overall TCO thanks to reduced raw storage needs and reduced power needs (less hardware + low power devices save on power and cooling). Also, uniformly scalable storage systems with an automated healing mechanism drastically reduce the management effort and cost.

So what are the use cases for object storage? As data needs grow, object storage will become the storage paradigm of choice in more and more environments, but already today we see the need in a number of situations:

Building live archives
Object storage enables companies to re-activate their data. Currently, most companies see data more as a burden than anything else: the data will never be used again but needs to be archived for a whole lot of reasons. But this data actually has a lot of value. By using live archives, employees have faster access to older data and they can use those valuable resources. With traditional storage it would never be achievable to build disk based archives for this purpose as the overhead would make this too costly.

Online applications
Most of the data-intensive online – cloud – applications are built on public clouds such as Amazon S3, which are early implementations of Object Storage. The benefits for the application providers are plenty: a simple programming interface, low cost and fast time to market. As their data sets grow, those companies might move to private Object Storage implementations to reduce costs even more.

Media and entertainment
Traditionally, the M&E industry has been very much file-oriented but we’re seeing a growing interest in object storage to optimize efficiency and reduce costs, but also because this industry is already hitting the limits of their file systems.

These are just a few examples of Object Storage implementations for Big Unstructured Data. Object Storage was not built to replace any of the current storage architectures. Very much like NAS filers were designed in the 90ies because block storage (SAN was designed when databases were king) was not optimized for Unstructured Data, Object Storage will find it’s place next to those two for Big Unstructured Data.

More Stories By Tom Leyden

Tom Leyden is VP Product Marketing at Scality. Scality was founded in 2009 by a team of entrepreneurs and technologists. The idea wasn’t storage, per se. When the Scality team talked to the initial base of potential customers, the customers wanted a system that could “route” data to and from individual users in the most scalable, efficient way possible. And so began a non-traditional approach to building a storage system that no one had imagined before. No one thought an object store could have enough performance for all the files and attachments of millions of users. No one thought a system could remain up and running through software upgrades, hardware failures, capacity expansions, and even multiple hardware generations coexisting. And no one believed you could do all this and scale to petabytes of content and billions of objects in pure software.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
Artificial intelligence, machine learning, neural networks. We’re in the midst of a wave of excitement around AI such as hasn’t been seen for a few decades. But those previous periods of inflated expectations led to troughs of disappointment. Will this time be different? Most likely. Applications of AI such as predictive analytics are already decreasing costs and improving reliability of industrial machinery. Furthermore, the funding and research going into AI now comes from a wide range of com...
For organizations that have amassed large sums of software complexity, taking a microservices approach is the first step toward DevOps and continuous improvement / development. Integrating system-level analysis with microservices makes it easier to change and add functionality to applications at any time without the increase of risk. Before you start big transformation projects or a cloud migration, make sure these changes won’t take down your entire organization.
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
We build IoT infrastructure products - when you have to integrate different devices, different systems and cloud you have to build an application to do that but we eliminate the need to build an application. Our products can integrate any device, any system, any cloud regardless of protocol," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
SYS-CON Events announced today that Ayehu will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara California. Ayehu provides IT Process Automation & Orchestration solutions for IT and Security professionals to identify and resolve critical incidents and enable rapid containment, eradication, and recovery from cyber security breaches. Ayehu provides customers greater control over IT infras...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex software systems for startups and enterprises. Since 2009 it has grown from a small group of passionate engineers and business...
SYS-CON Events announced today that GrapeUp, the leading provider of rapid product development at the speed of business, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market acr...
"We are a monitoring company. We work with Salesforce, BBC, and quite a few other big logos. We basically provide monitoring for them, structure for their cloud services and we fit into the DevOps world" explained David Gildeh, Co-founder and CEO of Outlyer, in this SYS-CON.tv interview at DevOps Summit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 21st Int\ernational Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their ...
DevOps at Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to w...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists examined how DevOps helps to meet the de...
Cloud Expo, Inc. has announced today that Andi Mann and Aruna Ravichandran have been named Co-Chairs of @DevOpsSummit at Cloud Expo Silicon Valley which will take place Oct. 31-Nov. 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. "DevOps is at the intersection of technology and business-optimizing tools, organizations and processes to bring measurable improvements in productivity and profitability," said Aruna Ravichandran, vice president, DevOps product and solutions marketing...
Automation is enabling enterprises to design, deploy, and manage more complex, hybrid cloud environments. Yet the people who manage these environments must be trained in and understanding these environments better than ever before. A new era of analytics and cognitive computing is adding intelligence, but also more complexity, to these cloud environments. How smart is your cloud? How smart should it be? In this power panel at 20th Cloud Expo, moderated by Conference Chair Roger Strukhoff, pane...
SYS-CON Events announced today that Cloud Academy named "Bronze Sponsor" of 21st International Cloud Expo which will take place October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara, CA. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud com...
What's the role of an IT self-service portal when you get to continuous delivery and Infrastructure as Code? This general session showed how to create the continuous delivery culture and eight accelerators for leading the change. Don Demcsak is a DevOps and Cloud Native Modernization Principal for Dell EMC based out of New Jersey. He is a former, long time, Microsoft Most Valuable Professional, specializing in building and architecting Application Delivery Pipelines for hybrid legacy, and cloud ...
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
In his session at Cloud Expo, Alan Winters, an entertainment executive/TV producer turned serial entrepreneur, presented a success story of an entrepreneur who has both suffered through and benefited from offshore development across multiple businesses: The smart choice, or how to select the right offshore development partner Warning signs, or how to minimize chances of making the wrong choice Collaboration, or how to establish the most effective work processes Budget control, or how to ma...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.