Welcome!

Cloud Expo Authors: Maureen O'Gara, Kevin Benedict, Derek Harris, Pat Romanski, Francois Lascelles

Related Topics: XML, Virtualization, Cloud Expo

XML: Article

Managing Cloud Applications

Cloud computing will change the processes and tools that IT organizations currently use

As enterprises evaluate if and how cloud computing fits into their core IT services, they must consider how they will manage cloud services as part of their day-to-day operations. This article examines how operational management of cloud computing differs from traditional methods, and examine techniques for addressing these needs.

Cloud computing will change the processes and tools that IT organizations currently use. In a traditional datacenter environment, IT organizations have complete control and visibility into their infrastructure. They install each piece of hardware and therefore have complete configuration control. All components in the network are accessible and can be monitored with the right tools. Most enterprises have invested heavily in complex tools in order to manage this environment so that they can identify service-affecting conditions, and analyze performance metrics so they may tune their systems to optimize performance.

For cloud computing services, the enterprise no longer has control and visibility into the components of the service. Yet if the cloud is to replace a core service, how can the IT organization guarantee the equivalent availability and performance service levels? In today's IT environment, isolating problems between an enterprise and its vendor are the most difficult to resolve. Cloud vendors are painting a future in which an enterprise will pick multiple cloud services from a market of cloud services, which means these problems will become more common and more complex. Yet enterprises will not deploy services, even if they are more costly and more agile, if they cannot provide an acceptable level of service. The relationship between cloud vendors and enterprises must evolve. Vendors must not only earn the trust of enterprises, but must provide mechanisms where they can verify that trust in a transparent manner. One step toward that goal is to have management tools that can provide the in-depth views that customers need and that can prove promised service levels are being met.

Let's look at how such a system might work, from a technical perspective. Most enterprise class management systems include the following basic features:

The ability to gather metric information from a variety of components, including:

  • Linux systems: CPU utilization, load, memory, swap, disk, processes. etc.
    -Windows: CPU, disk, memory, services, WMI metrics, etc.
    -Network: ping, TCP port response, latency, SNMP polls, etc.
    -Application: HTTP, web services, logs, etc.
  • Generate alerts based on metric thresholds exceeded.
  • Automated notification on alerts.
  • Performance reports on metrics for system tuning.

The process of metric gathering should open as simply as possible by enabling many scripting options. Many current enterprise tools require specific and deep programming skills in order to expand monitoring. This limits the use of the tool since most management systems are deployed by system administrators, not software developers. For managing cloud applications, this is even more important since interfacing to specific cloud vendors will require writing to their APIs, which is typically a REST-type interface.

Enterprise operations typically have additional requirements. Ideally, a tool can integrate with other tools that the enterprise uses. For example, monitoring tools may get their configuration information from a configuration database. Alerts may be fed into a problem ticketing system. The more automation capability that tool has, the better chance it has of fitting into current IT operational processes.

How might this system deal with the dynamic configuration of cloud systems? For many existing tools the provisioning process is a problem. Existing systems management tools are designed to follow a host name/IP address model, not a virtualized model. You need to define the IP/host of each managed system. However, cloud instances are typically dynamically defined. Take the case of Amazon EC2. The host name and IP address are assigned on instance startup. As shown in Figure 1, in order to use these tools, the instance must first be started, the provisioning parameters (IP/host) extracted from the Amazon API and implemented in the management system, and then the management system must be reloaded to implement the change. The exact mechanics vary depending on the management system and cloud vendor, but all rely on a tight dependency between cloud configuration and the monitoring system. This disconnect can cause a lag time in monitoring the true cloud configuration, or worse, an incomplete monitoring system.

Another issue with existing management tools is limited visibility into the cloud infrastructure's operational data. Each cloud vendor has their own configuration definition and operational parameter fields. I call this data, the cloud vendor's "metadata." In the case of Amazon EC2, the metadata includes instance-id, Amazon image ID (AMI), security groups, location, public DNS name and private DNS name. Existing tools are not designed to gather this metadata. When IT operations personnel are troubleshooting EC2 problems, it would be difficult to understand the entire scenario without this metadata. Every cloud vendor has their own metadata, so the problem is further complicated with each additional cloud vendor.

An alternative to using existing management tools is to rely on the vendor to provide the required visibility. However, today, most vendor tools and APIs provide limited visibility. Most infrastructure providers only show whether the instance is running or not. From the infrastructure cloud provider, this makes sense. They are responsible for the virtual server, not what a user might install on it. Amazon has recognized this issue and has tried to address it with their CloudWatch service. This is an optional service that allows the user to gather additional instance metrics like CPU utilization, disk read/write operations, and throughput from Amazon's APIs. However, Amazon only exposes the information - it is up to the user to use the data for alerts or reporting. Though there are some entry-level cloud tools that read API information for status, they do not provide the management features previously listed.

Cloud-specific tools are usually not designed for use by IT enterprise operations. Simple web browser-oriented interfaces are fine for monitoring a few development instances, but enterprises can require monitoring of hundreds of instances and thousands of metrics, which would be beyond the capability of most Web applications. For enterprises, IT operations that are used for in-depth monitoring and high function capability, vendor services alone are inadequate.

The preferred approach is to combine the capability of high-function, enterprise management tools integrated with information from vendor APIs. This system is shown in Figure 2.

In this system, standard monitoring scripts can be deployed either under the control of agents on monitored instances, or as active checks from the management server. Supporting open source scripts, such as those from the Nagios plug-in project, will allow in-depth monitoring of many components, including those listed above. However, this basic monitoring information must be augmented with vendor API information. Vendor data may be queried from the agent, the management server, or both. This approach allows the management system to process vendor metadata combined with monitoring data. Views presented to operators and system administrators can then show much richer information.

Dynamic changes in the cloud must be immediately recognized by the system. One way this can be handled is to change the requirement for managed systems to be pre-defined. Events received by managed systems can be processed as long as they are authenticated. This "event-based" model allows new instances to manage as soon as they start.

Let's look at an example. Suppose we are monitoring a set of application servers running in the Amazon EC2 cloud. A standard script used to get memory utilization from the Linux system would result in the output, "Free memory at 98%. Critical severity."

In a traditional system, this information is associated with the host that it is run on. The management information received after the execution of such a script would look like Table 1.

However, if we combine the cloud metadata, the event information would be (using Amazon EC2) (see Table 2).

An operator looking at this event information would have more complete information on the source and impact of this memory alert. Greater value can be realized by correlating all event information by vendor metadata. For example, grouping instances by their location (or which vendor datacenter the virtual instance is running) might explain system behavior. Access to the metadata also gives the management system the opportunity to perform higher-level functional checks on the managed cloud application.

The following application scenario is based on a real user. The application consists of many server instances running in the Amazon EC2 cloud. The application consists of groups of servers, each performing a different role. There may be up to 50 groups running in the application. The group role is determined by a parameter passed in the "User Data" field of the Amazon EC2 metadata.

The management system accesses the metadata and customizes its active checks based on the role contained in the metadata. Since dynamic changes are handled, as the number of instances within each role group changes, the management system will adapt. Existing management tools were in place, so the cloud management system gathered some of its metric information from an existing tool rather than requiring re-instrumentation. This made the transition to managing the cloud easier for operations.

Since the application consists of many groups of instances, the status of a single instance is not as important as the status of the role's group. One type of higher-level check that was applied is to compute average (or optionally, maximum or minimum) values across the group. Alerts can then be generated based on group metrics rather than instance metrics. The operator has the ability to drill down from the group to the instance values.

This example is using infrastructure cloud provider services. An equivalent scenario can be applied for platform service providers. The difference would be that the monitoring metrics would be testing the platform providers APIs rather than traditional measurements. Again, this implies a tight integration between the management system and the cloud vendor's services.

The management system in this scenario can be used as a basis for establishing vendor trust because of the following advantages:

  • Tight interface with the cloud and enables vendor configuration data to be integrated into the management system
  • Open monitoring capability enables deep monitoring beyond vendor APIs
  • Any monitoring information available from the vendor API can be incorporated
  • Any monitoring information from existing management tools can be incorporated
  • The ability to create higher-level management metrics based on lower-level measurements

These advantages also enable the system to be a basis for trust-enabling applications. For example, a billing report can be generated based on the telemetry information gathered by the system. This report would be independent from the vendor, generated by the gathered metric data and the vendor's stated billing policy. This report could be used as an independent audit of the vendor's bill. Another example would be to gather the security information from the vendor's configuration and perform TCP port checks defined by those groups. This verifies the security policy stated by the vendor is enforced for this user's cloud configuration.

If we project this scenario to the future where multiple cloud vendors are used by an enterprise, the management system would look like the following shown in Figure 3.

This system would be a consolidation point for all cloud services, and would translate the heterogeneous cloud services into a common view, simplifying IT operations.

For cloud computing to fulfill its promise of enabling enterprise IT organizations to improve the service it provides to its users, traditional IT operational processes and tools must adapt to a new ways of interacting with external vendor services. I've examined some of the issues that enterprises are encountering, and have offered solutions. But it is clear that these issues are a barrier for adoption of cloud services, and the needs of enterprise operations must be addressed by the cloud community.

More Stories By Peter Loh

Peter Loh is CEO of Tap In Systems. He has been developing, marketing and selling network and systems management software for over 25 years. He has implemented IT management systems for large Fortune 500 companies, such as Bank of America, ATT, Visa and American Express. He spent 10 years in technical and marketing positions at IBM, and has since been involved with a number of start up companies. Most recently he was in charge of engineering for GroundWork Open Source, a company leveraging open source software to implement IT management solutions. Peter holds a BS in Electrical Engineering.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Cloud Expo Breaking News
Why are APIs so important in clouds? Do APIs have to be open? How fast or slow will standardization in the cloud be? Why is ensuring high availability for the cloud service critical? In his session at the 10th International Cloud Expo, Mårten Mickos, CEO of Eucalyptus Systems, will answer these questions and address cloud standards, APIs and the critical question: Will we end up with one, two or more competing cloud standards? And, how will this affect the evolution and adoption of cloud comput...
Very few trends in IT have generated as much buzz as cloud computing. In his session at the 10th International Cloud Expo, Mark Hinkle, Director, Cloud Computing Community at Citrix, will cut through the hype and quickly clarify the ontology for cloud computing. The bulk of the conversation will focus on the open source software that can be used to build compute clouds (infrastructure-as-a-service) and the complementary open source management tools that can be combined to automate the management...
The proliferation of device connectivity is redefining the functionality requirements and capabilities of many embedded systems as more and more of these devices look to leverage the “Cloud.” While many commercial software and hardware component vendors have begun to realign their value propositions to satisfy growing demand, commercial-off-the-shelf products (COTS) alone cannot meet every OEM’s needs. As a result, the Embedded Cloud has injected a new level of uncertainty and a new competitive ...
Hardware and chemistry improvements will make the $1,000 human genome a reality soon. While the massive amount of genomics data that will be generated represents a huge opportunity to advance personal medicine, it also presents an enormous big data challenge. In his session at the 10th International Cloud Expo, Dr Andreas Sundquist, CEO of DNAnexus, will discuss how the cloud will address these issues by enabling the management, storage, sharing and analysis of the world’s DNA data and how it ...
With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what else h...
With Big Data Expo 2012 New York (co-located with 10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference...
In 2011, Apache Hadoop received tremendous attention for helping organizations cost-effectively capitalize on their big data. Hadoop is now disrupting the business of analyzing data. In his session at the 10th International Cloud Expo, Eric Baldeschwieler, Co-Founder & CEO of Hortonworks, will look at the current state of the Hadoop project, lessons learned by deploying it at scale, and the roadmap for its future. Big Data Track attendees will learn about the exciting developments that have ...
The focus of Java EE 7 is on the cloud, and specifically it aims to bring Platform-as-a-Service providers and application developers together so that portable applications can be deployed on any cloud infrastructure and reap all its benefits in terms of scalability, elasticity, multitenancy, etc. The existing specifications in the platform such as JPA, Servlets, EJB, and others will be updated to meet these requirements. Java EE 7 continues the ease of development push that characterized prior ...
With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what else h...
With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what else h...