Welcome!

@CloudExpo Authors: Liz McMillan, Zakia Bouachraoui, Yeshim Deniz, Pat Romanski, Elizabeth White

Related Topics: FinTech Journal, Containers Expo Blog, Cloud Security, @DevOpsSummit

FinTech Journal: Article

Beyond DevOps: Security vs. Speed? | @DevOpsSummit #APM #DevOps

Several problems arise when the harm of software failure cannot be treated as an unbound variable

Fail fast, fail often. Yeah, but the first failure blew up the satellite. Well, this is just a photo-sharing app..not rocket science. Okay, but your photos are accessed by users who have passwords that they probably use for other things..and aren't some photos as important as satellites?

Several problems arise when the harm of software failure cannot be treated as an unbound variable. Here are some thoughts on two. I'll write more on two more (one cognitive, one computational) later.

Problem 1: Identity Persists Across Non-Obviously Coupled Systems (So the Stakes Are Higher Than Your Application)
Worse: security failures cascade well beyond physically contiguous realms (if root then everything) into physically decoupled systems via informational (shared passwords, mailboxes) or physical-but-accidental (power cut then reboot) channels. The brilliant and terrifying Have I been pwned? tool -- to say nothing of the astonishing air-gap-annihilating Stuxnet [pdf] surfaces the obvious but easy-to-forget truisms that simply not having data that should not be accessed by X on the same disk as data that can be accessed by X is not good enough, and that the danger posed by access to one application may be slim compared with the danger posed by access to something more serious via the identity compromised by an in-itself non-dangerous breach.

So even if 'fail fast' is okay for your application, it may not be okay for your users. The result: natural tension between the ideal of continuous delivery -- or even Agile more broadly, or even heavily iterative development in general -- and security.

And while one of the major insights of Agile is that the best refiner is the real world (as opposed to the limited imagination of the planners), one of the major embarrassments of InfoSec is that 95% of security breaches involve human error. For Agile, failure is falling until you can walk. For InfoSec, failure is letting the terrifying cat out of the poorly-designed bag. Post-breach, maybe you've started to salt your hashes (congrats, you're more cryptographically sophisticated than Julius Caesar) but your users' passwords are in the wild.

Problem 2: You Have Actual Human Enemies (So Something Smarter Than Chance Is Trying to Outsmart You)
On sheer randomness, the Internet is getting more dangerous (Akamai records crazy DDoS increases over the past year - 122% for application-level (OSI Layer 7) attacks alone??). But the really scary problem is that real, smart, often well-funded humans are trying to make your software do what you didn't design it to do. For most failures, the enemy is "imprecise requirements" or "poor algorithm design" or "inadequately scalable environment" (or even just 'blundering users'); for security failures, the enemy is malicious engineers.

This is the meatiest bit of the (otherwise slightly theatrical) Rugged Manifesto:

I recognize that my code will be attacked by talented and persistent adversaries who threaten our physical, economic and national security.

Yeah. So engineer.add(<malice, talent, persistence>), return ???? -- and multiply(????, world.get(amountEatenBySoftware) = ????!!!!!

If DevOps is a management practice, then a risk of ????!!!!! is pretty much unacceptable.


None of this, of course, means that Agile isn't an awesome idea. Nor am I suggesting that security can't be baked in to an iterative, continuously improving process - certainly it can, but on the face of it this seems to require a bit of finagling. And of course the proper way to address security will always be risk analysis, with a good lump of threat analysis included in any measure of technical debt.

I'd love to take some taxonomy of software errors (maybe regarding security in particular) and cross-tab cost per error type with cycle time (i.e. length of cycle during which each error that cost d dollars was introduced against cost d), normalizing by estimated technical debt accrued during each cycle (assuming somebody measured that at the time, which probably didn't happen). But maybe someone has done that (definitely seen lots of costs by error but not correlated with cycle time), and (since technical debt is kind of a guess anyway) maybe anecdotes are a better gauge of the security cost of "shift left" anyway.

Anyone have any experiences they'd like to share?

More Stories By John Esposito

John Esposito is Editor-in-Chief at DZone, having recently finished a doctoral program in Classics from the University of North Carolina. In a previous life he was a VBA and Force.com developer, DBA, and network administrator. John enjoys playing piano and looking at diagrams, and raises two cats with his wife, Sarah.

CloudEXPO Stories
With more than 30 Kubernetes solutions in the marketplace, it's tempting to think Kubernetes and the vendor ecosystem has solved the problem of operationalizing containers at scale or of automatically managing the elasticity of the underlying infrastructure that these solutions need to be truly scalable. Far from it. There are at least six major pain points that companies experience when they try to deploy and run Kubernetes in their complex environments. In this presentation, the speaker will detail these pain points and explain how cloud can address them.
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-centric compute for the most data-intensive applications. Hyperconverged systems already in place can be revitalized with vendor-agnostic, PCIe-deployed, disaggregated approach to composable, maximizing the value of previous investments.
When building large, cloud-based applications that operate at a high scale, it's important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. "Fly two mistakes high" is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, discussed how this same philosophy can be applied to highly scaled applications, and can dramatically increase your resilience to failure.
Machine learning has taken residence at our cities' cores and now we can finally have "smart cities." Cities are a collection of buildings made to provide the structure and safety necessary for people to function, create and survive. Buildings are a pool of ever-changing performance data from large automated systems such as heating and cooling to the people that live and work within them. Through machine learning, buildings can optimize performance, reduce costs, and improve occupant comfort by sharing information within the building and with outside city infrastructure via real time shared cloud capabilities.
As Cybric's Chief Technology Officer, Mike D. Kail is responsible for the strategic vision and technical direction of the platform. Prior to founding Cybric, Mike was Yahoo's CIO and SVP of Infrastructure, where he led the IT and Data Center functions for the company. He has more than 24 years of IT Operations experience with a focus on highly-scalable architectures.