@CloudExpo Authors: Elizabeth White, Flint Brenton, Liz McMillan, Rostyslav Demush, Pat Romanski

Related Topics: @CloudExpo, Microservices Expo, Containers Expo Blog

@CloudExpo: Blog Feed Post

Measuring Cloud Storage Performance: Blocks vs. Files

What are some good reasons to adopt cloud storage?

What are some good reasons to adopt cloud storage? Cost, durability and flexibility.

So let me talk about performance, instead.

Look at this graph:

provider bandwidth measurements

As part of our daily testing, we do routine performance measurements across a broad swath of cloud storage providers. It gives us a check to ensure that the various CloudArray subsystems are performing as they should, and gives us the data to make optimization decisions. In this particular test, we measure transfer rates at various buffer sizes. We “fill the pipe” by queueing up multiple streams of data simultaneously, initiating one transfer as soon as the previous one finishes, so that latency doesn’t skew the data.

This particular provider reaches its peak bandwidth at 256K transfer size, and is actually transferring at 50% of peak at 32K.

But look at this graph:

microsoft file distribution data

This data is from a Microsoft study published in FAST 2007. It describes the distribution of file sizes in a file system — interestingly, even though the mean file size does creep up over the years of the study, the distribution doesn’t change much. What we can draw from it is the fact that in a typical file system, roughly 80% of the files are less than 32k.

We can see that a naive system which just maps files on the file system directly to objects in the cloud is going to spend 80% of its file transfers at the bottom half of the bandwidth curve, achieving less than 50% of the peak available bandwidth. That’s ignoring latency, too: our hypothetical naive system is going to have to be streaming files out at a perfectly synchronized pace in order to achieve the theoretical maximum.

Our Provider A actually does pretty well with small transfers, rising to peak bandwidth relatively rapidly. What about a different provider?

provider bandwidth measurements -- two providers

What abysmal performance, right? Provider B’s bandwidth only rises to almost 20% of Provider A’s by 512K. Forget about issues with small writes: what reason would anybody have for picking Provider B? Could any cost or durability benefits be enough to suffer performance penalties this big?

But let’s zoom out and take a bigger picture view:

provider bandwidth measurements: large IO

Oops. This graph tells an entirely different story. There’s a real performance benefit to using Provider B, assuming that you are transferring large chunks of data.

Every cloud storage provider has different characteristics, even if the APIs are similar. The role of a cloud storage gateway is to smooth over the differences and provide a predictable solution for storing data in the cloud. That’s what CloudArray does: aggregates lots of small writes into large-block transfers, absorbs transient failures and network faults, and generally works to manage and optimize cloud storage utilization.

If a gateway vendor tells you that they won’t work with a particular cloud storage provider because its performance doesn’t meet their SLAs, then the simple fact is that their gateway isn’t doing its job. The more complicated fact is that naive file-to-object mappers will always be fundamentally flawed when dealing with real-world business data, because real-world business data is housed within file systems, and file systems are designed to talk to storage subsystems, and storage subsystems are designed to talk to disk controllers.

Anybody who’s worked in the enterprise storage array business can tell you about the small write problem in RAID: here it is again, written in the clouds. You’d pay a penalty for writing small pieces of data to a RAID volume, except for the years of work that have been spent developing storage systems that smooth out performance without sacrificing reliability. Odds are that many users have never heard of the small write problem, much less tuned their software to it or tried to plan out their file systems around optimizing their storage arrays.

But that’s exactly what they’ll need to be doing with their cloud storage, unless they use CloudArray. What’s our secret sauce? Actually, it’s no secret: we’re a block storage device, and we can tune and perfect our transfers to match your provider. That means we minimize time-to-durability and maximize the effectiveness of our cache. And that’s why we can make pictures like this:

We don’t make recommendations or publish our performance test results because ultimately, the choice of cloud storage provider should be a business decision, based on business factors like cost, durability, location, and a host of others. CloudArray’s architecture and capabilities make it possible for our customers to make those decisions for themselves, while being assured of getting the best performance.


– John Bates, CTO


Footnote/Mathematical aside: a careful reader will note that I discussed 80% of transfers, not transferred data. In fact, depending upon the total number of files and the distribution of sizes of the upper 20%, small files may be less than 1% of the total used capacity. And therefore, given the totally unrealistic model which disregards latency, a system with a low ratio of file count to total capacity and with a provider performance profile like A’s would pay only a minor small write penalty.

But that just serves to strengthen my point: why should any of this matter you, the user? A system with a higher ratio, or with a high-bandwidth skewed provider, can wind up spending 60% of its time transferring 10% of its data. Why should it be up to you to calculate your file system distributions and match them to the right cloud storage?

Read the original blog entry...

More Stories By Nicos Vekiarides

Nicos Vekiarides is the Chief Executive Officer & Co-Founder of TwinStrata. He has spent over 20 years in enterprise data storage, both as a business manager and as an entrepreneur and founder in startup companies.

Prior to TwinStrata, he served as VP of Product Strategy and Technology at Incipient, Inc., where he helped deliver the industry's first storage virtualization solution embedded in a switch. Prior to Incipient, he was General Manager of the storage virtualization business at Hewlett-Packard. Vekiarides came to HP with the acquisition of StorageApps where he was the founding VP of Engineering. At StorageApps, he built a team that brought to market the industry's first storage virtualization appliance. Prior to StorageApps, he spent a number of years in the data storage industry working at Sun Microsystems and Encore Computer. At Encore, he architected and delivered Encore Computer's SP data replication products that were a key factor in the acquisition of Encore's storage division by Sun Microsystems.

@CloudExpo Stories
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
"We work around really protecting the confidentiality of information, and by doing so we've developed implementations of encryption through a patented process that is known as superencipherment," explained Richard Blech, CEO of Secure Channels Inc., in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Major trends and emerging technologies – from virtual reality and IoT, to Big Data and algorithms – are helping organizations innovate in the digital era. However, to create real business value, IT must think beyond the ‘what’ of digital transformation to the ‘how’ to harness emerging trends, innovation and disruption. Architecture is the key that underpins and ties all these efforts together. In the digital age, it’s important to invest in architecture, extend the enterprise footprint to the cl...
Leading companies, from the Global Fortune 500 to the smallest companies, are adopting hybrid cloud as the path to business advantage. Hybrid cloud depends on cloud services and on-premises infrastructure working in unison. Successful implementations require new levels of data mobility, enabled by an automated and seamless flow across on-premises and cloud resources. In his general session at 21st Cloud Expo, Greg Tevis, an IBM Storage Software Technical Strategist and Customer Solution Architec...
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, discussed the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
"We were founded in 2003 and the way we were founded was about good backup and good disaster recovery for our clients, and for the last 20 years we've been pretty consistent with that," noted Marc Malafronte, Territory Manager at StorageCraft, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Effectively SMBs and government programs must address compounded regulatory compliance requirements. The most recent are Controlled Unclassified Information and the EU's GDPR have Board Level implications. Managing sensitive data protection will likely result in acquisition criteria, demonstration requests and new requirements. Developers, as part of the pre-planning process and the associated supply chain, could benefit from updating their code libraries and design by incorporating changes. In...
Andi Mann, Chief Technology Advocate at Splunk, is an accomplished digital business executive with extensive global expertise as a strategist, technologist, innovator, marketer, and communicator. For over 30 years across five continents, he has built success with Fortune 500 corporations, vendors, governments, and as a leading research analyst and consultant.
No hype cycles or predictions of zillions of things here. IoT is big. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, Associate Partner at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He discussed the evaluation of communication standards and IoT messaging protocols, data analytics considerations, edge-to-cloud tec...
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
Announcing Poland #DigitalTransformation Pavilion
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
CloudEXPO | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
DXWorldEXPO LLC announced today that All in Mobile, a mobile app development company from Poland, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. All In Mobile is a mobile app development company from Poland. Since 2014, they maintain passion for developing mobile applications for enterprises and startups worldwide.
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
The best way to leverage your CloudEXPO | DXWorldEXPO presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering CloudEXPO | DXWorldEXPO will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at CloudEXPO. Product announcements during our show provide your company with the most reach through our targeted audienc...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors!
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...