@CloudExpo Authors: Yeshim Deniz, Elizabeth White, Liz McMillan, Mehdi Daoudi, Pat Romanski

Related Topics: @CloudExpo, Java IoT, Microservices Expo, Containers Expo Blog, Agile Computing, Apache

@CloudExpo: Blog Feed Post

Amazon Delivers Cloud Archive Storage with Glacier

Glacier enables AWS customers to store their long-term retention data within Amazon’s existing data centres at a very low cost

At the end of August 2012, Amazon Web Services released their latest service offering – a long-term archive service called Glacier.  As a complement to their existing active data access service S3, Glacier provides long term storage for “cold” data – information that has to be retained for a long time but doesn’t require frequent access.

What Exactly is Glacier?

Many organisations need to retain data in archive format for extended periods of time.  This is for regulatory or compliance purposes or may simply be part of their normal business process.  Good examples are medical, healthcare, financial or media (video and audio) data.  Typically for many IT departments, backup has provided a lazy way of archiving information.  Access to backups retained for up to 10 years provides a cheap and rudimentary archive service.  However backup isn’t archive (see my recent article on the subject) as an archive provides additional features around data management and security.  Glacier enables AWS customers to store their long-term retention data within Amazon’s existing data centres at a very low cost, starting at $0.01/GB per month.  The low cost is tempered with rather leisurely access times of between 3-5 hours for data retrieval.

Within Glacier, data is stored in vaults.  Up to 1000 vaults may be created per AWS region, with each vault providing individual security credentials via Amazon’s IAM (Identity and Access Management) service.  Within a vault, data is stored in an archive, which consists of one or more files.  Obviously if multiple files need to be stored together for consistency purposes then they can be stored as a single archive.  An unlimited number of archive files can be created, with a limit of 40TB on any single archive file itself.

Data uploaded to Glacier is stored using AES-256 encryption, managed by AWS.  Customers requiring their own encryption are advised to pre-encrypt their data before upload.

Amazon are claiming a data “durability” level of 99.999999999% per archive although I’m not really sure how they define the term “durability” and exactly what that means in terms of data loss.

As mentioned earlier, data retrieval is between 3-5 hours per archive.  Retrieval requests (or jobs as they are known in Glacier) are queued asynchronously and can be notified once complete via AWS SNS (Simple Notification Service).  Once data is retrieved, it is available for access to the customer for 24 hours.  The long retrieval time implies that the majority of Glacier data is stored on tape, with retrieval resulting in a copy to disk for general access.  Based on the costs, this also makes sense.

The Charging Model

Charging for Glacier is more complex than the other AWS offerings and includes the following components:

  • $0.01/GB/month for storage of data
  • Data upload – no charge for data volume
  • Upload and retrieval requests – $0.05 per 1000 requests
  • Archive query commands (list vault contents, get job status, delete objects) – no charge
  • Data retrieval – 5% of archive per month for free, $0.011/GB upwards after that
  • Data out (moving data outside an AWS region) – $0.12 – $0.05/GB dependent on volume
  • Moving data to EC2 – no charge
  • Deletion of data less than 90 days old – $0.033/GB

It’s interesting that there is a charge for deleting new data, presumably to encourage users to use the service for the purpose it was intended.  In addition, only 5% of the archive can be retrieved per month without incurring costs (although data out incurs a cost), however there are no costs for transferring data to EC2.  This creates an ecosystem that encourages data to be kept in Glacier, using EC2 as the indexing and search or refresh mechanism.

What’s Not Included

Glacier itself is simply a large storage vault for data.  All objects are stored using 138-byte character keys.  Data access is managed via REST-based APIs that can also be developed using pre-coded Java and .NET SDKs.  This means there are no facilities within Glacier for providing some of the most fundamental parts of an archive – notably metadata and indexing capabilities.  These need to be developed by the user themselves and as yet I haven’t found anyone offering services that use Glacier as their storage platform.  There are a few little quirks to bear in mind too.  For instance, vaults are inventoried on a daily basis, so could be inconsistent with any external index the user creates.

The Architect’s View

Amazon have provided a framework and storage repository that could be used by many organisations to store their data over the long term.  This does not mean that tape is dead – far from it – Glacier itself is certainly using tape technology.  What Amazon are providing is a data store against which 3rd party developers can create their own archive solutions in a similar way to that being used for S3 (think Nasuni or Jungledisk).  There are already many other cloud archiving solutions available today (see the same recent article) and on its own Glacier doesn’t represent direct competition, but rather provides another storage platform in which data can be stored.  However there are a few things to consider when using a Glacier-based service;

  • The indexing of data is purely based on any 3rd party vendor’s indexing system or needs to be managed by the end user
  • Taking data out of the archive to move elsewhere will incur a cost
  • Refreshing data within the archive will incur a cost

Glacier and the supporting services could therefore represent a significant and unexpected lock-in for customers.

Overall, Glacier does provide a framework against which developers can create new services for archive and that’s a good thing.  Cost will be a significant factor for many and the marketing-set price of $0.01/GB/month certainly sounds attractive.  Like the other AWS offerings, I’m sure Glacier will be very successful.


Related Links

Read the original blog entry...

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

@CloudExpo Stories
DXWorldEXPO LLC announced today that Dez Blanchfield joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Dez is a strategic leader in business and digital transformation with 25 years of experience in the IT and telecommunications industries developing strategies and implementing business initiatives. He has a breadth of expertise spanning technologies such as cloud computing, big data and analytics, cognitive computing, m...
"We started a Master of Science in business analytics - that's the hot topic. We serve the business community around San Francisco so we educate the working professionals and this is where they all want to be," explained Judy Lee, Associate Professor and Department Chair at Golden Gate University, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
DXWorldEXPO LLC announced today that Kevin Jackson joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Kevin L. Jackson is a globally recognized cloud computing expert and Founder/Author of the award winning "Cloud Musings" blog. Mr. Jackson has also been recognized as a "Top 100 Cybersecurity Influencer and Brand" by Onalytica (2015), a Huffington Post "Top 100 Cloud Computing Experts on Twitter" (2013) and a "Top 50 C...
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also receive...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
Daniel Jones is CTO of EngineerBetter, helping enterprises deliver value faster. Previously he was an IT consultant, indie video games developer, head of web development in the finance sector, and an award-winning martial artist. Continuous Delivery makes it possible to exploit findings of cognitive psychology and neuroscience to increase the productivity and happiness of our teams.
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
Predicting the future has never been more challenging - not because of the lack of data but because of the flood of ungoverned and risk laden information. Microsoft states that 2.5 exabytes of data are created every day. Expectations and reliance on data are being pushed to the limits, as demands around hybrid options continue to grow.
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Evan Kirstel is an internationally recognized thought leader and social media influencer in IoT (#1 in 2017), Cloud, Data Security (2016), Health Tech (#9 in 2017), Digital Health (#6 in 2016), B2B Marketing (#5 in 2015), AI, Smart Home, Digital (2017), IIoT (#1 in 2017) and Telecom/Wireless/5G. His connections are a "Who's Who" in these technologies, He is in the top 10 most mentioned/re-tweeted by CMOs and CIOs (2016) and have been recently named 5th most influential B2B marketeer in the US. H...
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of bus...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
As you move to the cloud, your network should be efficient, secure, and easy to manage. An enterprise adopting a hybrid or public cloud needs systems and tools that provide: Agility: ability to deliver applications and services faster, even in complex hybrid environments Easier manageability: enable reliable connectivity with complete oversight as the data center network evolves Greater efficiency: eliminate wasted effort while reducing errors and optimize asset utilization Security: implemen...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
@DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises - and delivering real results.