Welcome!

@CloudExpo Authors: Liz McMillan, Zakia Bouachraoui, Yeshim Deniz, Pat Romanski, Elizabeth White

Related Topics: @ThingsExpo, Java IoT, Microservices Expo, Linux Containers, @CloudExpo, @DXWorldExpo, @DevOpsSummit

@ThingsExpo: Blog Post

Spotting Anomalies When Things Are Calm

Watching for blind spots

Monitoring application performance on the surface and the currents below is a great way to build a performance baseline and provide application fluency. Ironically, the deep dive tools sets in place today still may not provide all the insight you need to quickly resolve anomalous behavior.

Standing back on the shore waiting for an event to go by may not be the best approach for proactive monitoring. Synthetic monitoring (active monitoring) is needed to help reduce the blind spots for critical business applications.

For example, we just experienced a production issue on a fully instrumented critical business application that first appeared nebulous.

During peak volume time the Service Desk was taking calls from users across random locations stating that they couldn't login, however if they were already on the system all was well.  Even when the users logged out they could still login again and continue working. Other facts that came in made the issue more perplexing:

  • RUM showed transaction volume and performance was normal.
  • Deep dive Java monitoring agents showed the same.
  • There were no glaring HTTP 500 errors and the backend database was fine.
  • Infrastructure monitoring was green in all tiers and resource consumption was within baseline.
  • What did we use to find the issue then? It was our synthetic monitoring tool that popped an alert on two externally facing applications.

    Root Cause? Our Internet provider’s DNS resolution was not working properly. So any machine that needed name resolution that wasn’t already cached for the day, couldn’t get a login page. For further insight, click here for the full article.

    Image: Travis Miller/Flickr (Top);

    More Stories By Larry Dragich

    Larry Dragich is actively involved with industry leaders, sharing knowledge of Application Performance Management (APM) technologies, from best practices and technical workflows, to resource allocation and approaches for implementation. He has been working in the APM space since 2006 where he built the Enterprise Systems Management team which is now the focal point for IT performance monitoring and capacity planning activities.

    CloudEXPO Stories
    With more than 30 Kubernetes solutions in the marketplace, it's tempting to think Kubernetes and the vendor ecosystem has solved the problem of operationalizing containers at scale or of automatically managing the elasticity of the underlying infrastructure that these solutions need to be truly scalable. Far from it. There are at least six major pain points that companies experience when they try to deploy and run Kubernetes in their complex environments. In this presentation, the speaker will detail these pain points and explain how cloud can address them.
    The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-centric compute for the most data-intensive applications. Hyperconverged systems already in place can be revitalized with vendor-agnostic, PCIe-deployed, disaggregated approach to composable, maximizing the value of previous investments.
    When building large, cloud-based applications that operate at a high scale, it's important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. "Fly two mistakes high" is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, discussed how this same philosophy can be applied to highly scaled applications, and can dramatically increase your resilience to failure.
    Machine learning has taken residence at our cities' cores and now we can finally have "smart cities." Cities are a collection of buildings made to provide the structure and safety necessary for people to function, create and survive. Buildings are a pool of ever-changing performance data from large automated systems such as heating and cooling to the people that live and work within them. Through machine learning, buildings can optimize performance, reduce costs, and improve occupant comfort by sharing information within the building and with outside city infrastructure via real time shared cloud capabilities.
    As Cybric's Chief Technology Officer, Mike D. Kail is responsible for the strategic vision and technical direction of the platform. Prior to founding Cybric, Mike was Yahoo's CIO and SVP of Infrastructure, where he led the IT and Data Center functions for the company. He has more than 24 years of IT Operations experience with a focus on highly-scalable architectures.