@CloudExpo Authors: Elizabeth White, Yeshim Deniz, Zakia Bouachraoui, Pat Romanski, Liz McMillan

Related Topics: @CloudExpo, Java IoT, Microservices Expo, Microsoft Cloud, Agile Computing, @DXWorldExpo

@CloudExpo: Article

Part 2 | Understand the Impact of IT on Business

It Takes More than Advanced Correlation

View Part 1 here

Part 2 of a two part blog series looking at the journey enterprise IT departments take as they increasingly seek to understand the relationships and impact of  IT infrastructure performance on application performance and business services.

Stage 4: Correlation
Through observation, Fred notices that even though his alarms, based on dynamic baselines, catch problems in his environment, they're also catching busy days, quiet days, and even slightly odd days. He starts to realize that just looking at the level of metrics is necessary, but not sufficient. Fred also needs to look at how the metrics work together - he needs statistical correlation.

With a real-time correlation package in place, Fred realizes that with correlation in addition to baselines, he not only knows when the level of activity is unusual, he knows when the profile of activity is unusual as well. Now, Fred can finally see the difference between odd and alarming. The numbers of alerts he sees is more manageable as well, since he's putting much more weight behind what the correlation has to tell him. With the end in sight, Fred decides to send the alarms he's getting directly to the NOC. Finally, he's confident that he's produced the best possible outcome.

Had I written this two years ago, this would be the end. I would have introduced Netuitive's real-time analytics and correlation capabilities as a key differentiator for our solution, and I'd be nearly as satisfied with myself as Fred. But we'd both be wrong.

Stage 5: Integrated Knowledge
Fred gets a drop-in visit from Audrey, who manages the Level 1 technicians at Acmecorp. Her job is to ensure that her team either diagnoses and resolves issues or escalates issues as quickly as possible to the right Level 2 specialty team. Fred can tell that Audrey has something on her mind.

"Fred, what does this mean?" She hands him a printout of an alarm from Fred's new monitoring system.

He examines the printout. "Oh, this is telling you that the queue lengths between the BuyNow checkout and the credit card processing service is unusually high...at the same time."

"I get that," Audrey interrupts. "My point is, what can my team do with this? If we have to run to you for the big picture every time an alarm rolls out, we're not exactly living up to our mission of a rapid response. Granted, there's useful stuff here, but we're not recognizing it quickly enough. Some we never figure out, and for others, we know why it's alarming, but it's not even something we need to worry about."

Fred sighs, "Okay, let me work on it." Audrey leaves the printout on the desk as Fred stares at the ceiling. It dawns on him that instead of creating a better run book, he's better off taking his knowledge of the application and building it into the application as much as possible. With an embedded sense of how to interpret the statistics, Audrey's team can have the immediate benefit of Fred's experience even when he's not available. If he builds it right, Audrey's team will only see the issues they can quickly handle on their own, or escalate to the right Level 2 team, or development.

Fred takes out his worn notebook titled "Incidents, Outages and Misfortunes" and begins to consolidate them down to a knowledge base of standard operation procedures specific to the BuyThis platform, and its set of customer constraints. After discussions with Audrey, he decides to focus his attention on I/O contention issues that have caused four outages over the last two months. Through analysis of past incidents, they've noticed that a strong symptom of their I/O contention is when their message queues lengths rise suddenly at the same time that the transaction rate on a particular set of databases falls below the expected. Fred includes the necessary conditions to indicate the problem, as well as his incident avoidance plan - in this case, looking for rogue batch processing jobs on the database servers. When he's done, he's defined several entries in his new knowledge base. They provide him with an automated way to diagnose a problem, as well as give a heads up to Audrey's team on what he feels is the correct escalation procedure.

"This," Fred says to himself, "is definitely better." After conferring with Audrey to make sure she and her team understand the plan and agree to follow the recommendations from the knowledge base, Fred moves the new entries into production and prepares himself for the next step.

Step 6: Acceptance and Maintenance
It doesn't take long for Fred to realize his job isn't done with getting buy-in from Audrey's team. Although his late night calls from the Level 1 NOC have tapered off due to his Knowledge plan, he finds himself spending more time with the Level 2 team leads. Some believe the knowledge Fred has poured into the system is good, but it still needs tweaks for the particulars of their environment. Others, like the DBAs, feel it needs to be thoroughly overhauled before they can possibly make use of it.

Fred calls them together for a meeting, along with Audrey and the EVP in charge of the BuyThis platform, Kevin. "Guys," he begins. "I understand your concerns. You want more input into the kinds of knowledge we're embedding into the system. As I see it, I have a good set of knowledge I could use to get the ball rolling with this system. Not perfect, but a strong start, and we've already seen the benefits."

"However, we could do better, both in terms of tuning our existing knowledge base, as well as expanding it. Luckily..." and Fred paused and looked around the table, "we've got the right set of experts around the table to begin the process."

With Kevin's blessing, Fred convenes a weekly session with the Level 1 and 2 department heads to proactively review the state of the current knowledge base, as well as to propose and vet possible enhancements based on recent performance. Though there is pushback at first, a combination of executive pressure and positive results brings even the reluctant around to Fred's way of thinking. Fred comes to realize that he would have been better off beginning the process with buy-in from the team leads, rather than starting off by mandate.

As the process of creating and refining knowledge base entries continues, Audrey's team becomes much more effective as they spend their critical moments on issue avoidance rather than diagnosis.  BuyThis enters a streak of solid performance.  Slow downs and downtime are a thing of the past, the impact of IT and application performance on the business are now measurable, and the teams are more integrated and efficient than ever before.

As for Fred, he's even begun to look towards Stage 7, automating the response to certain knowledge base entries. For now, though, he's happy to be sleeping much better.

More Stories By Marcus Jackson

Marcus is Director of Product Management at Netuitive. He is responsible for the direction of Netuitive's flagship product, including analytics and data visualization. He has over 20 years of experience in software engineering and performance management. Previously, he headed development for Netuitive and the IEEE Computer Society. Marcus holds a bachelor's degree in Computer Science from Harvard University.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

CloudEXPO Stories
While a hybrid cloud can ease that transition, designing and deploy that hybrid cloud still offers challenges for organizations concerned about lack of available cloud skillsets within their organization. Managed service providers offer a unique opportunity to fill those gaps and get organizations of all sizes on a hybrid cloud that meets their comfort level, while delivering enhanced benefits for cost, efficiency, agility, mobility, and elasticity.
Isomorphic Software is the global leader in high-end, web-based business applications. We develop, market, and support the SmartClient & Smart GWT HTML5/Ajax platform, combining the productivity and performance of traditional desktop software with the simplicity and reach of the open web. With staff in 10 timezones, Isomorphic provides a global network of services related to our technology, with offerings ranging from turnkey application development to SLA-backed enterprise support. Leading global enterprises use Isomorphic technology to reduce costs and improve productivity, developing & deploying sophisticated business applications with unprecedented ease and simplicity.
DevOps has long focused on reinventing the SDLC (e.g. with CI/CD, ARA, pipeline automation etc.), while reinvention of IT Ops has lagged. However, new approaches like Site Reliability Engineering, Observability, Containerization, Operations Analytics, and ML/AI are driving a resurgence of IT Ops. In this session our expert panel will focus on how these new ideas are [putting the Ops back in DevOps orbringing modern IT Ops to DevOps].
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understanding as the environment changes.
Enterprises are striving to become digital businesses for differentiated innovation and customer-centricity. Traditionally, they focused on digitizing processes and paper workflow. To be a disruptor and compete against new players, they need to gain insight into business data and innovate at scale. Cloud and cognitive technologies can help them leverage hidden data in SAP/ERP systems to fuel their businesses to accelerate digital transformation success.