@CloudExpo Authors: Zakia Bouachraoui, Elizabeth White, Liz McMillan, Pat Romanski, Roger Strukhoff

Related Topics: Industrial IoT, Containers Expo Blog, @CloudExpo

Industrial IoT: Article

Managing Cloud Applications

Cloud computing will change the processes and tools that IT organizations currently use

As enterprises evaluate if and how cloud computing fits into their core IT services, they must consider how they will manage cloud services as part of their day-to-day operations. This article examines how operational management of cloud computing differs from traditional methods, and examine techniques for addressing these needs.

Cloud computing will change the processes and tools that IT organizations currently use. In a traditional datacenter environment, IT organizations have complete control and visibility into their infrastructure. They install each piece of hardware and therefore have complete configuration control. All components in the network are accessible and can be monitored with the right tools. Most enterprises have invested heavily in complex tools in order to manage this environment so that they can identify service-affecting conditions, and analyze performance metrics so they may tune their systems to optimize performance.

For cloud computing services, the enterprise no longer has control and visibility into the components of the service. Yet if the cloud is to replace a core service, how can the IT organization guarantee the equivalent availability and performance service levels? In today's IT environment, isolating problems between an enterprise and its vendor are the most difficult to resolve. Cloud vendors are painting a future in which an enterprise will pick multiple cloud services from a market of cloud services, which means these problems will become more common and more complex. Yet enterprises will not deploy services, even if they are more costly and more agile, if they cannot provide an acceptable level of service. The relationship between cloud vendors and enterprises must evolve. Vendors must not only earn the trust of enterprises, but must provide mechanisms where they can verify that trust in a transparent manner. One step toward that goal is to have management tools that can provide the in-depth views that customers need and that can prove promised service levels are being met.

Let's look at how such a system might work, from a technical perspective. Most enterprise class management systems include the following basic features:

The ability to gather metric information from a variety of components, including:

  • Linux systems: CPU utilization, load, memory, swap, disk, processes. etc.
    -Windows: CPU, disk, memory, services, WMI metrics, etc.
    -Network: ping, TCP port response, latency, SNMP polls, etc.
    -Application: HTTP, web services, logs, etc.
  • Generate alerts based on metric thresholds exceeded.
  • Automated notification on alerts.
  • Performance reports on metrics for system tuning.

The process of metric gathering should open as simply as possible by enabling many scripting options. Many current enterprise tools require specific and deep programming skills in order to expand monitoring. This limits the use of the tool since most management systems are deployed by system administrators, not software developers. For managing cloud applications, this is even more important since interfacing to specific cloud vendors will require writing to their APIs, which is typically a REST-type interface.

Enterprise operations typically have additional requirements. Ideally, a tool can integrate with other tools that the enterprise uses. For example, monitoring tools may get their configuration information from a configuration database. Alerts may be fed into a problem ticketing system. The more automation capability that tool has, the better chance it has of fitting into current IT operational processes.

How might this system deal with the dynamic configuration of cloud systems? For many existing tools the provisioning process is a problem. Existing systems management tools are designed to follow a host name/IP address model, not a virtualized model. You need to define the IP/host of each managed system. However, cloud instances are typically dynamically defined. Take the case of Amazon EC2. The host name and IP address are assigned on instance startup. As shown in Figure 1, in order to use these tools, the instance must first be started, the provisioning parameters (IP/host) extracted from the Amazon API and implemented in the management system, and then the management system must be reloaded to implement the change. The exact mechanics vary depending on the management system and cloud vendor, but all rely on a tight dependency between cloud configuration and the monitoring system. This disconnect can cause a lag time in monitoring the true cloud configuration, or worse, an incomplete monitoring system.

Another issue with existing management tools is limited visibility into the cloud infrastructure's operational data. Each cloud vendor has their own configuration definition and operational parameter fields. I call this data, the cloud vendor's "metadata." In the case of Amazon EC2, the metadata includes instance-id, Amazon image ID (AMI), security groups, location, public DNS name and private DNS name. Existing tools are not designed to gather this metadata. When IT operations personnel are troubleshooting EC2 problems, it would be difficult to understand the entire scenario without this metadata. Every cloud vendor has their own metadata, so the problem is further complicated with each additional cloud vendor.

An alternative to using existing management tools is to rely on the vendor to provide the required visibility. However, today, most vendor tools and APIs provide limited visibility. Most infrastructure providers only show whether the instance is running or not. From the infrastructure cloud provider, this makes sense. They are responsible for the virtual server, not what a user might install on it. Amazon has recognized this issue and has tried to address it with their CloudWatch service. This is an optional service that allows the user to gather additional instance metrics like CPU utilization, disk read/write operations, and throughput from Amazon's APIs. However, Amazon only exposes the information - it is up to the user to use the data for alerts or reporting. Though there are some entry-level cloud tools that read API information for status, they do not provide the management features previously listed.

Cloud-specific tools are usually not designed for use by IT enterprise operations. Simple web browser-oriented interfaces are fine for monitoring a few development instances, but enterprises can require monitoring of hundreds of instances and thousands of metrics, which would be beyond the capability of most Web applications. For enterprises, IT operations that are used for in-depth monitoring and high function capability, vendor services alone are inadequate.

The preferred approach is to combine the capability of high-function, enterprise management tools integrated with information from vendor APIs. This system is shown in Figure 2.

In this system, standard monitoring scripts can be deployed either under the control of agents on monitored instances, or as active checks from the management server. Supporting open source scripts, such as those from the Nagios plug-in project, will allow in-depth monitoring of many components, including those listed above. However, this basic monitoring information must be augmented with vendor API information. Vendor data may be queried from the agent, the management server, or both. This approach allows the management system to process vendor metadata combined with monitoring data. Views presented to operators and system administrators can then show much richer information.

Dynamic changes in the cloud must be immediately recognized by the system. One way this can be handled is to change the requirement for managed systems to be pre-defined. Events received by managed systems can be processed as long as they are authenticated. This "event-based" model allows new instances to manage as soon as they start.

Let's look at an example. Suppose we are monitoring a set of application servers running in the Amazon EC2 cloud. A standard script used to get memory utilization from the Linux system would result in the output, "Free memory at 98%. Critical severity."

In a traditional system, this information is associated with the host that it is run on. The management information received after the execution of such a script would look like Table 1.

However, if we combine the cloud metadata, the event information would be (using Amazon EC2) (see Table 2).

An operator looking at this event information would have more complete information on the source and impact of this memory alert. Greater value can be realized by correlating all event information by vendor metadata. For example, grouping instances by their location (or which vendor datacenter the virtual instance is running) might explain system behavior. Access to the metadata also gives the management system the opportunity to perform higher-level functional checks on the managed cloud application.

The following application scenario is based on a real user. The application consists of many server instances running in the Amazon EC2 cloud. The application consists of groups of servers, each performing a different role. There may be up to 50 groups running in the application. The group role is determined by a parameter passed in the "User Data" field of the Amazon EC2 metadata.

The management system accesses the metadata and customizes its active checks based on the role contained in the metadata. Since dynamic changes are handled, as the number of instances within each role group changes, the management system will adapt. Existing management tools were in place, so the cloud management system gathered some of its metric information from an existing tool rather than requiring re-instrumentation. This made the transition to managing the cloud easier for operations.

Since the application consists of many groups of instances, the status of a single instance is not as important as the status of the role's group. One type of higher-level check that was applied is to compute average (or optionally, maximum or minimum) values across the group. Alerts can then be generated based on group metrics rather than instance metrics. The operator has the ability to drill down from the group to the instance values.

This example is using infrastructure cloud provider services. An equivalent scenario can be applied for platform service providers. The difference would be that the monitoring metrics would be testing the platform providers APIs rather than traditional measurements. Again, this implies a tight integration between the management system and the cloud vendor's services.

The management system in this scenario can be used as a basis for establishing vendor trust because of the following advantages:

  • Tight interface with the cloud and enables vendor configuration data to be integrated into the management system
  • Open monitoring capability enables deep monitoring beyond vendor APIs
  • Any monitoring information available from the vendor API can be incorporated
  • Any monitoring information from existing management tools can be incorporated
  • The ability to create higher-level management metrics based on lower-level measurements

These advantages also enable the system to be a basis for trust-enabling applications. For example, a billing report can be generated based on the telemetry information gathered by the system. This report would be independent from the vendor, generated by the gathered metric data and the vendor's stated billing policy. This report could be used as an independent audit of the vendor's bill. Another example would be to gather the security information from the vendor's configuration and perform TCP port checks defined by those groups. This verifies the security policy stated by the vendor is enforced for this user's cloud configuration.

If we project this scenario to the future where multiple cloud vendors are used by an enterprise, the management system would look like the following shown in Figure 3.

This system would be a consolidation point for all cloud services, and would translate the heterogeneous cloud services into a common view, simplifying IT operations.

For cloud computing to fulfill its promise of enabling enterprise IT organizations to improve the service it provides to its users, traditional IT operational processes and tools must adapt to a new ways of interacting with external vendor services. I've examined some of the issues that enterprises are encountering, and have offered solutions. But it is clear that these issues are a barrier for adoption of cloud services, and the needs of enterprise operations must be addressed by the cloud community.

More Stories By Peter Loh

Peter Loh is CEO of Tap In Systems. He has been developing, marketing and selling network and systems management software for over 25 years. He has implemented IT management systems for large Fortune 500 companies, such as Bank of America, ATT, Visa and American Express. He spent 10 years in technical and marketing positions at IBM, and has since been involved with a number of start up companies. Most recently he was in charge of engineering for GroundWork Open Source, a company leveraging open source software to implement IT management solutions. Peter holds a BS in Electrical Engineering.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

CloudEXPO Stories
The precious oil is extracted from the seeds of prickly pear cactus plant. After taking out the seeds from the fruits, they are adequately dried and then cold pressed to obtain the oil. Indeed, the prickly seed oil is quite expensive. Well, that is understandable when you consider the fact that the seeds are really tiny and each seed contain only about 5% of oil in it at most, plus the seeds are usually handpicked from the fruits. This means it will take tons of these seeds to produce just one bottle of the oil for commercial purpose. But from its medical properties to its culinary importance, skin lightening, moisturizing, and protection abilities, down to its extraordinary hair care properties, prickly seed oil has got lots of excellent rewards for anyone who pays the price.
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected path for IoT innovators to scale globally, and the smartest path to cross-device synergy in an instrumented, connected world.
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
ScaleMP is presenting at CloudEXPO 2019, held June 24-26 in Santa Clara, and we’d love to see you there. At the conference, we’ll demonstrate how ScaleMP is solving one of the most vexing challenges for cloud — memory cost and limit of scale — and how our innovative vSMP MemoryONE solution provides affordable larger server memory for the private and public cloud. Please visit us at Booth No. 519 to connect with our experts and learn more about vSMP MemoryONE and how it is already serving some of the world’s largest data centers. Click here to schedule a meeting with our experts and executives.
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understanding as the environment changes.