Welcome!

Cloud Expo Authors: Jeremy Geelan, Liz McMillan, Elad Yoran, Kevin Nikkhoo, RealWire News Distribution

Related Topics: Virtualization, Java, SOA & WOA, Cloud Expo, Security, SDN Journal

Virtualization: Blog Feed Post

Bare Metal Blog: Mean Time Between Failures

MTBF has meaning well beyond storage

If you are new to the Bare Metal Blog series, find them all here

When assembling a model – any model, from a highly detailed functional replica of an engine to a mass produced plastic model of an airplane – there are several places where things can go wrong. The final product is only as good as the model kit, the glue used, the tools used, and the skill of the craftsman. I’ve seen the same exact model assembled and painted by two different people that look completely different, simply because of the array of variables and how they interact.

This is true of high tech equipment also, and like modeling, it is often overlooked. Interestingly, in my entire IT career, MTBF has only been a measure that meant a ton in two circumstances: When designing hardware and scoping the parts to go in it, and when talking about storage. In all other endeavors, MTBF if mentioned was a side note.

And yet it matters. It can matter a lot. Like most hardware companies (because we spec our own parts and monitor our own quality), we track MTBF both computed from the sum of the parts with average environmental considerations, and actual tracking based upon support cases involving hardware and RMAs. For us, knowing helps us improve quality. For customers, knowing helps gauge the bounds of useful life for the equipment being purchased. Of course, MTBF is a mean, not a fact, and it is entirely possible for a device to last much longer than its MTBF, in fact the fact that it is a mean kind of implies that roughly half of the devices out there will last longer. But it’s the mean, not the median, and most IT shops do not want to plan like a device will last well beyond its MTBF value. MTBF can offer a bit of guidance when it is fairly calculated, and another tool in the evaluation toolbox never hurt an IT shop.

As mentioned earlier in this series, F5 sets quality standards for suppliers to meet, if they wish to continue supplying. This allows a bit better control over MTBF than doing something like “lowest bidder” or similar procurement, simply because the standards set include the quality of parts used, which all rolls into the MTBF calculations – and more importantly for most IT shops, the MTBF reality. While MTBF is a complex set of equations, you can generalize to “the MTBF of a device is as low as or lower than the MTBF of its weakest part”. That means supplier quality standards matter in a very real way. I had a RAID array fail on me once – several drives down all at the same time. The array vendor had to count that as a failure, since RAID no longer worked (thank heavens for backups!), but the failure was on the part of one of their suppliers. That’s how it is in the manufacturing world whomevers’ name is on the box gets the bad rep for quality, regardless of whose handiwork was slipshod. That is why F5’s non-stop quality monitoring program (devices are tested from before release until EOL is announced) matters a lot. It’s also why quality standards for parts suppliers matter more then getting the absolute cheapest part, as some manufacturers are wont to do.

I will not replicate our entire knowledge base article here, if you have an ask.f5.com account, you can click here to read it. I’ll just summarize and pull bits out for the readers’ enjoyment.

F5 gear runs the gauntlet from entry level to massive blade systems. As such, MTBF varies from device to device. The worst calculated MTBF for an F5 device is over three years. And our quality team tells me that the calculated value is far lower than the real-life-experience value they get from watching returns and such. The best calculated MTBF is over 21 years. It’s a rare piece of computer gear that is used that long, but Lori and I have got some pretty old F5 gear that’s still clipping away like it was new, so no surprises there. Most F5 devices fall somewhere in between.

Why the large variance in MTBFs if we control for quality? A valid question. The fact is that it is not all about the quality of parts. Airflow inside the device, number of redundant parts, number of removable parts… there are a zillion other things that go into MTBF, and they all tend to get better as the device gets physically larger. Entry level devices are small, restricting airflow and cutting down on available space for redundant power supplies, etc. While the top end blade servers have room for all of that, and since cards are replaceable, tend to less failures. You will find a similar spread with any other vendor that covers such a wide range of hardware. And all of those numbers are likely to beat out a COTS server running a software product.

So when looking at any electronic gear, ask about MTBF. Alone it simply gives you insight into the priorities for the device you’re looking at, when combined with the MTBF numbers from several different devices (the same manufacturer or multiple), it gives you an idea of what you are buying in terms of quality. Of course with a large chunk of any given appliance handled in software, MTBF is not as meaningful as it once was, but it is still the underlying bedrock for that software to run on.

Read the original blog entry...

More Stories By Don MacVittie

Don MacVittie is a Technical Marketing Manager at F5 Networks. In this role, he supports outbound marketing, education, and evangelism efforts around development, storage, and IT management topics related to F5 solutions. His role includes authoring technical materials, participating in social and community-based forums, and providing guidance for the development of marketing resources. As an industry veteran, MacVittie has extensive programming experience along with project management, IT management, and systems/network administration expertise.

Prior to joining F5, MacVittie was a Senior Technology Editor at Network Computing, where he conducted product research and evaluated storage and server systems, as well as development and outsourcing solutions. He has authored numerous articles on a variety of topics aimed at IT professionals. MacVittie holds a B.S. in Computer Science from Northern Michigan University, and an M.S. in Computer Science from Nova Southeastern University.

Cloud Expo Breaking News
Enterprises can't close their doors just because integration tools won't cope with the volume of information that their systems produce. As each day goes by, their information will become larger and more complicated, and enterprises must constantly struggle to manage the integration of dozens (or hundreds) of systems. Apache Hadoop has quickly become the technology of choice for enterprises that need to perform complex analysis of petabytes of data, but few are aware of its potential to hand...
Companies around the world are collecting massive amounts of data everyday that’s sitting around and not being utilized. Take for example the fact that companies collect demographic and location-based data via mobile devices all the time, but have to figure out how to monetize that data. In this session, Joyent CTO and founder Jason Hoffman will examine the state of Big Data, taking a look at what we're doing now to discussing what's on the horizon, as companies prepare and realign their busines...
Planning scalable environments isn't terribly difficult, but it does require a change of perspective. During this session we'll broaden our views to think on an Internet Scale by dissecting a video publishing application built with The SoftLayer Platform, Message Queuing, Object Storage, and Drupal. By examining a scalable modular application build that can handle unpredictable traffic, you'll be able to grow your development arsenal and pick up a few strategies to apply to your own projects.
Learn about the complex regulations surrounding HIPAA compliance and other considerations for running sensitive data in the Cloud. In their session at the 12th International Cloud Expo, Ken Ziegler, CEO of Logicworks, and Frank Nydam, Director of Healthcare Solutions at VMware, will discuss the best practices for leveraging virtualization and cloud technologies without sacrificing security or compliance. Care providers, State and Federal entities, integrators and SaaS providers large and small...
Backup, Recovery, and Archiving (BURA) are critical elements for IT to address. BURA solutions need to address a broad spectrum of needs including data protection, regulatory compliance, and business continuity. Today's cloud based solutions can enable customers to procure and consume BURA as a service supported by EMC's latest technologies. At the end of Rich Place's session at 12th Cloud Expo | Cloud Expo New York [June 10-13, 2013], you will be able to: 1. Gain a full understanding of Backup,...
A recent study by analyst firm IDC reports that in 2012, 1.7 million cloud computing-related roles across the globe could not be filled due to the lack of training, certification and experience in the applicant pool. As the global demand for cloud and big data expertise increases, employers are finding it difficult to recruit talent, which is slowing down the ability for organizations to adopt, implement, and realize benefits from innovative platforms like OpenStack. In this session join Clo...
Cloud enables SMBs to access new, scalable resources – previously only available to enterprises – in flexible and cost-effective ways. McKinsey’s SMB Cloud Report projects the public cloud market to reach $40-$50 billion by 2015, with SMBs comprising 65% of public cloud spending in 2015. But selling cloud to SMBs raises the questions of who, what and how. In this session Manjula Talreja, VP of Cisco’s Global Cloud Business Development Team, will discuss the importance of knowing who SMB...
Compelling consumer applications are created every day. Are you ready for the IT implications both internally and externally? As your datacenter needs more capacity, the cloud will be critical to success. What are the key considerations to help plan for the needed capacity over time? And how can the cloud best work with your existing applications? In his General Session at the 12th International Cloud Expo, Brian Jawalka, Enterprise Solutions Architect at Rackspace Hosting, will open conversat...
Cloud computing is more than a buzz-phrase it’s a transformative IT paradigm shift. The emphasis in the cloud is on elasticity, scalability, agility and open. Not just open standards but open APIs and open source. The delivery of software is also going through a paradigm shift. Open source software was often a commoditization of a market leader; Unix to Linux or Oracle to MySQL what’s changing is that the iterative nature, user context and the motto of releasing early and often are driving real ...
These days, it seems that every cloud provider claims that cloud is safer than your traditional datacenter. Is it though? In his General Session at 12th Cloud Expo | Cloud Expo New York, McAfee expert Rishi Bhargava will help you explore and address the security challenges and considerations for public cloud (IaaS, PaaS and SaaS).