Welcome!

@CloudExpo Authors: Liz McMillan, Zakia Bouachraoui, Yeshim Deniz, Pat Romanski, Elizabeth White

Related Topics: @CloudExpo

@CloudExpo: Blog Feed Post

Storage in Cloud Is Not the Center of the Universe

The age of the super-scale single storage array is over

In a previous post, I touched on the need to have APIs for managing storage in cloud environments.  In this post, I’ll talk about how the way in which storage is deployed in cloud environments has to change.

For the last 10 years, the advent of Storage Area Networks (SANs) has created a storage-centric view of the world with storage at the centre and the “planets” – networking and servers – wrapped around it like some pre-Copernican view of the universe.  Over time, SANs have evolved to be ever bigger, with some organisations deploying huge fibre channel fabrics.  As we’ve seen today, EMC continues to perpetuate that view with the release of the VMAX 40K, a 4PB monster of a storage array in the best traditions of the central SAN-based model.

However the world has changed.  Storage is no longer the centre of the IT universe, but merely a player within it, and just as it came as a shock to those in power in the 1500′s when Copernicus proposed the sun was at the centre of the universe, so it will happen with IT and storage – especially so for cloud environments.

A Bit of History



SANs evolved from a time before (x86) virtualisation when everyone deployed physical servers.  The storage in the server was isolated and the server chassis was the limiting factor on expansion of storage capacity.  Copper SCSI cable limitations meant storage and server needed to be close, so expanding the storage for a single server could mean re-racking and downtime.  Storage Area Networks and the use of optical fibre for the interconnect, allowed storage to be centralised.  Now the resources were centrally stored and so sharable by all servers; they were not tied by physical distance as optical fibre could be run for hundreds of metres and they were scalable as the storage arrays could be scaled up in size simply by adding more disk to the shared pool.  It’s also worth remembering that the first storage arrays from the 1990′s were made with much less reliable drive hardware than we have today.  As a consequence the arrays were over-engineered to provide the high level of availability that centralisation required.

Consolidation can go too far.  Placing all storage resources into one or a small number of arrays increases the impact of the following:

  • Change Control – upgrading of microcode or other physical change has a wider impact and can be more difficult to get approved unless maintenance windows are well structured.
  • Failure – the failure of a single array can have huge consequences as they scale and support more servers
  • Complexity – large arrays benefit from scale in both capacity and performance, however larger arrays are more complex to manage (hence the introduction of auto-tiering technology), especially from a performance perspective,
  • Lifecycle – as arrays get bigger in size, the effort to migrate data on and off the array at the beginning and end of their lifecycle results in additional cost and wasted resources.
There is clearly a “sweet spot” in terms of array size, purely from the manageability angle.

Virtualisation & Cloud



The shared model works well with physically separate servers.  However virtualisation has changed the server landscape; where before we had hundreds of servers in the data centre, now we see those ratios drop by a factor of 10:1 or 20:1 as virtualisation becomes mainstream.  These ratios can be even higher in cloud environments where greater consolidation levels are required.  Previously the server to storage ratio was a many to one relationship.  Today we are seeing vendors push architectures that have, in some cases, a one to one relationship.  Deploying a single storage array for every server may be a little extreme, but what we are seeing is a move away from a centralised model to one of scalable node-based storage, where storage can be added into an existing complex of arrays.  In addition, data management intelligence is moving into the hypervisor.  VMware now manages storage vMotion requests, dynamic data placement with DRS, offloading the “heavy lifting” to VAAI.  Technologies such as remote replication aren’t needed.

What this means is we’re seeing a move towards storage hardware being used as a pure IOPS generator.  In cloud solutions, storage needs to be lean and cheap, whilst still being reliable.  What it doesn’t need is lots of additional extras.

The Storage Architect Take

The age of the super-scale single storage array is over.  Storage consolidation through SAN is no longer needed and cloud deployments cope better from node-based scale-out storage solutions.  Although most intelligence is moving to the hypervisor, the ability to seamlessly move from one array to another is still a requirement.  Four petabytes in a single array isn’t needed by 90% of organisations and those who may need that level of capacity probably won’t deploy it in a single array.  It’s time to move on.

Related Links

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


CloudEXPO Stories
With more than 30 Kubernetes solutions in the marketplace, it's tempting to think Kubernetes and the vendor ecosystem has solved the problem of operationalizing containers at scale or of automatically managing the elasticity of the underlying infrastructure that these solutions need to be truly scalable. Far from it. There are at least six major pain points that companies experience when they try to deploy and run Kubernetes in their complex environments. In this presentation, the speaker will detail these pain points and explain how cloud can address them.
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-centric compute for the most data-intensive applications. Hyperconverged systems already in place can be revitalized with vendor-agnostic, PCIe-deployed, disaggregated approach to composable, maximizing the value of previous investments.
When building large, cloud-based applications that operate at a high scale, it's important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. "Fly two mistakes high" is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, discussed how this same philosophy can be applied to highly scaled applications, and can dramatically increase your resilience to failure.
Machine learning has taken residence at our cities' cores and now we can finally have "smart cities." Cities are a collection of buildings made to provide the structure and safety necessary for people to function, create and survive. Buildings are a pool of ever-changing performance data from large automated systems such as heating and cooling to the people that live and work within them. Through machine learning, buildings can optimize performance, reduce costs, and improve occupant comfort by sharing information within the building and with outside city infrastructure via real time shared cloud capabilities.
As Cybric's Chief Technology Officer, Mike D. Kail is responsible for the strategic vision and technical direction of the platform. Prior to founding Cybric, Mike was Yahoo's CIO and SVP of Infrastructure, where he led the IT and Data Center functions for the company. He has more than 24 years of IT Operations experience with a focus on highly-scalable architectures.