Welcome!

@CloudExpo Authors: SmartBear Blog, Elizabeth White, Liz McMillan, Pat Romanski, Ed Featherston

Related Topics: @CloudExpo, IoT User Interface

@CloudExpo: Article

Integrated Cloud-based Load Testing and Performance Management

New Integration from Keynote and dynaTrace

Load Testing has traditionally been done In-House with load-testing tools using machines in your test center to generate HTTP traffic against the application needing to be tested for high volume transactions. With agile development practices, shorter release cycles and higher number of users that will ultimately access a web application from more places around the world, in-house testing reached its limits. Maintaining a load-testing infrastructure that supports 10 or 100 thousands of users becomes costly. With rapidly changing applications, updating test scripts is also becoming a bigger challenge all the time, binding lots of test resources to just the task of maintaining test scripts. When running tests more frequently, analyzing test results becomes a task that consumes performance architects or engineers with analyzing graphs and log files in order to figure out what problems were just uncovered by the recent test.

Let’s summarize these problems/requirements:

  • It is important to run bigger loads than ever before as our apps are accessed by more users around the globe
  • Besides just running the load we want to know how end-user performance is perceived from different locations around the globe
  • It is costly to own and maintain a test environment large enough to support these loads
  • It is time consuming to constantly adapt test scripts to reflect the changes of every product iteration
  • It takes experienced performance engineers or architects too long to analyze the test results and identify the root cause of problems

Cloud-Based Load-Testing with integrated Application Performance Management
Cloud based Load Testing solves many of these problems by providing high-volume tests from around the globe at specific times at a manageable cost. But it comes with some requirements on the tested application, and services like this must meet certain requirements in order to solve the discussed problems:

  • The Application Under Test (AUT) must be accessible from the internet as the generated transactions are generated from machines around the globe and not within the local test environment. Companies usually use part of their production system at “off-hours” to host the version of the application to be tested. This allows running large scale tests without having to have a replicate of the production environment for testing
  • The load testing service must provide an easy way to create and update scripts to adapt to changes within a product’s iterations. Otherwise too much time and effort is put into setting up tests.
  • The service must integrate with performance management software that runs on the tested application. This allows correlating data shown in load testing reports (Response Times, Transaction Rates, Bandwidth Usage …) with data captured in the application infrastructure (Transaction Times, CPU, Memory, Exceptions, …)

Proof of Concept: Load Testing with Keynote integrated with Application Performance Management from dynaTrace
Together with Keynote’s Load Testing Consultants we set up the following environment showcasing the benefits of an integrated solution of Cloud-Based Load Testing and Application Performance Management.

Step 1: Deploying the application
We deployed a 4 tier (2 Java and 2 .NET Runtimes) eCommerce Travel Portal on a hosted virtual infrastructure so that it is accessible by the Cloud-based Load Testing Service. We also installed and configured dynaTrace to manage this multi-tier heterogeneous application in order to identify problems once we put load on the system.

Application Dependency between the 4 tier heterogenous application

Application Dependency between the 4 tier heterogeneous application

Step 2: Test Scripts and Keynote/dynaTrace Integration
Keynote modeled several use-case scenarios based on the testing requirements we had on our application. We ended up with use cases such as executing a specific search, accessing the last-minute offers page or purchasing a trip. dynaTrace provides an integration interface for Load-Testing and Monitoring Services that allows us to link every executed synthetic request with the transaction that dynaTrace traces on the application server when these requests are handled by the application.

Step 3: Running a test
We decided to run a test starting with increasing load to figure out where the breaking point of our application is. We started with a load of 3000 sessions per hour running for 15 minutes and increasing this load every 15 minutes to 6k, 9, and 12k sessions/hour. It turned out our application broke much faster than we anticipated :-)

Step 4: Analyzing the Load Testing Report
When I log into the Keynote Load Testing Portal I start by looking at the load testing report that shows me the executed sessions, response times, page views and errors:

Keynote Load Testing Report showing an application problem when increasing the load to 6k sessions per hour

Keynote Load Testing Report showing an application problem when increasing the load to 6k sessions per hour

It is easy to see that – once we went from Phase 1 (3k sessions) to Phase 2 (6k sessions) – our application’s response times go through the roof causing most of the simulated users to experience timeouts. A click on the Page Error graph shows that these errors are mainly timeouts or connection errors. The question now is: Is this problem an application problem or is it related to the infrastructure? Without having insight to the application these results could be interpreted in multiple different ways, e.g: our hosting company doesn’t provide enough bandwidth. That is the point when Application Performance Management helps answering these uncertainties.

Step 5: Looking at application performance data
I’ve created two dashboards that I use to analyze application performance while or after running a load test. The first one is an Infrastructure Dashboard where I display CPU and Memory Utilization of all 4 Application Runtimes that are involved:

The Dashboard shows us high memory und GC activity on our GoSpaceBackend which also leads to very high CPU Utilization

The Dashboard shows us high memory und GC activity on our GoSpaceBackend which also leads to very high CPU Utilization

The red measure in the JVM Memory Usage graph indicates GC Collection time. The red in the CPU Usage indicates the max CPU Usage of that JVM. The conclusion is therefore easy. High memory usage leads to high GC activity which maxes out our CPU.

The next Dashboard gives me insight into the application itself – with all the involved application layers and the individual transactions that dynaTrace analyzed coming from the Keynote Load Test:

Transaction Response Time on the Application Server and Breakdown into Application Layers proofs that the slow response times are application related

Transaction Response Time on the Application Server and Breakdown into Application Layers proves that the slow response times are application-related

On the left of the Dashboard I placed a transaction overview of the individual use cases Keynote executed during the load test. It is easy to spot that once the load got ramped up to 6k sessions we saw a dramatic increase in response time on our application server. That means that our first question is answered: it is not an infrastructure problem with our web hosting but an application-specific problem. With the knowledge we already have by looking at the memory and CPU measures we can already guess that this is the main contributor. The performance breakdown on the bottom right also highlights which application layers were contributing the most to the application transaction response time. A double click on that graph gives us a close-up on this data:

Our Persistence Layer, EJBs and JMS are the main contributors of the application performance

Our Persistence Layer, EJBs and JMS are the main contributors of the application performance

Step 6: Drilling deeper into the problem
dynaTrace captured every single request that was executed while running the load test. Its PurePath technology is the enabler of the dashboards we looked at earlier. The next step is to identify what is really going on in the application and where is the main impact of the increased load. The next dashboard I created gives me a better overview of the application architecture, showing me which methods are called most often and how well they execute. I am also interested in database activity as well as individual web requests that were slow:

Detailed Overview of where my application hotspots are including application layers, methods, web requests and database statements

Detailed Overview of where my application hot spots are including application layers, methods, web requests and database statements

The dashboard again shows us that the primary application layer impacted is our persistence layer. It is also very interesting that the slowest URL is a web service hosted by our back-end application server and that we have a very high number of database statements coming from only a few web requests. This information is really valuable for the application architects who need insight into application dynamics under heavy load.

Step 7: Show me the root cause of these slow-running transactions
Not only can we get an overview of which requests were slow and how many methods or database statements were executed. We can now look into individual transactions, and also compare transactions to see where the difference is between a slow-running and a fast-running transaction. dynaTrace allows me to drill down to those 718 transactions that executed the slow running web service and I can inspect each individually:

All transactions (PurePaths) available for analysis. Selecting the slowest shows me where this web service got called and where time was spent

All transactions (PurePaths) available for analysis. Selecting the slowest shows me where this web service got called and where time was spent

Looking at the duration, CPU duration and Suspension Duration (Garbage Collection) really highlights the problem that we have. Suspension Time is really high with those transactions impacting the overall execution time.

I can also pick one that ran very slow and one that ran fast, and let dynaTrace compare these two transactions for me and highlight the differences:

Comparison shows the structural and timing difference between two transactions making it easy to spot the actual differences

Comparison shows the structural and timing difference between two transactions making it easy to spot the actual differences

Not only do I see how Garbage Collection impacts execution time of individual methods and the overall transaction. It also shows me how different the same transaction executes in case of an error (such as thrown abort exception) – which brings me to one additional dashboard I like to look it. This one includes exceptions, logging messages and an overview on the Garbage Collection runs on individual methods:

Exceptions including full stack traces, log messages with the context of where the were logged and an overview of suspended methods

Exceptions including full stack traces, log messages with the context of where they were logged and an overview of suspended methods

Step 8: Hand off the data
Looking at this data was easy as I simply look at these dashboards after the load test is finished. The dashboards already helped identifying several hot spots, e.g: high memory consumption by the back-end web services causing high GC, too many SQL statements per request, many hidden exceptions that never made it to a proper log message, …

dynaTrace makes this captured data available to the engineering team in order to resolve these problems. They can either access the data by directly accessing the dynaTrace environment used to capture this information. Another way is to export individual PurePaths or maybe all of them into a dynaTrace Session file which can be exchanged via email, Instant Messenger or attached to a bug ticket.

Proved the Concept: Cloud Based Load Testing with APM is ready for Agile Development

The problems/requirements listed in the beginning of this blog are solved/met with the integrated solution from Keynote and dynaTrace:

  • Keynote runs large scale load tests by driving load from many different locations around the globe
  • The global-distributed load generation allows us to identify local content delivery problems (slow network connections, wrongly configured CDNs, …)
  • The costs are under control as you only pay for the load test but don’t pay for maintaining your own load-testing infrastructure that would sit idle most of the time
  • Keynote makes it easy to create scripts and offers services to do the scripting for you
  • dynaTrace automatically highlights the problems identified during the load test. High-level analysis through dashboards doesn’t require highly skilled performance architects. The fine-grained data captured, however, gives the performance engineers and software architects actionable data without digging through log files or manually correlating a multitude of different performance metrics
  • No change to your application is required to use this integration

Related reading:

  1. End-to-End Monitoring and Load Testing with Keynote and dynaTrace We’ve learned from recent studies that performance has a direct...
  2. VS2010 Load Testing for Distributed and Heterogeneous Applications powered by dynaTrace Visual Studio 2010 is almost here – Microsoft just released...
  3. Performance Analysis in Load Testing Collection diagnostics information in Load Testing is a challenging task....
  4. From Cloud Monitoring to Effective Cloud Management – Webinar with IntraLinks on July 15th 2010 I am hosting a Webinar with IntraLinks this Wednesday. The...
  5. Elevating Web- and Load-Testing with MicroFocus SilkPerformer Diagnostics powered by dynaTrace MicroFocus and dynaTrace recently announced “SilkPerformer Assurance” and with that...

More Stories By Andreas Grabner

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

@CloudExpo Stories
The best-practices for building IoT applications with Go Code that attendees can use to build their own IoT applications. In his session at @ThingsExpo, Indraneel Mitra, Senior Solutions Architect & Technology Evangelist at Cognizant, provided valuable information and resources for both novice and experienced developers on how to get started with IoT and Golang in a day. He also provided information on how to use Intel Arduino Kit, Go Robotics API and AWS IoT stack to build an application tha...
Cloud analytics is dramatically altering business intelligence. Some businesses will capitalize on these promising new technologies and gain key insights that’ll help them gain competitive advantage. And others won’t. Whether you’re a business leader, an IT manager, or an analyst, we want to help you and the people you need to influence with a free copy of “Cloud Analytics for Dummies,” the essential guide to this explosive new space for business intelligence.
SYS-CON Events announced today that LeaseWeb USA, a cloud Infrastructure-as-a-Service (IaaS) provider, will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. LeaseWeb is one of the world's largest hosting brands. The company helps customers define, develop and deploy IT infrastructure tailored to their exact business needs, by combining various kinds cloud solutions.
Aspose.Total for .NET is the most complete package of all file format APIs for .NET as offered by Aspose. It empowers developers to create, edit, render, print and convert between a wide range of popular document formats within any .NET, C#, ASP.NET and VB.NET applications. Aspose compiles all .NET APIs on a daily basis to ensure that it contains the most up to date versions of each of Aspose .NET APIs. If a new .NET API or a new version of existing APIs is released during the subscription peri...
Whether your IoT service is connecting cars, homes, appliances, wearable, cameras or other devices, one question hangs in the balance – how do you actually make money from this service? The ability to turn your IoT service into profit requires the ability to create a monetization strategy that is flexible, scalable and working for you in real-time. It must be a transparent, smoothly implemented strategy that all stakeholders – from customers to the board – will be able to understand and comprehe...
Adding public cloud resources to an existing application can be a daunting process. The tools that you currently use to manage the software and hardware outside the cloud aren’t always the best tools to efficiently grow into the cloud. All of the major configuration management tools have cloud orchestration plugins that can be leveraged, but there are also cloud-native tools that can dramatically improve the efficiency of managing your application lifecycle. In his session at 18th Cloud Expo, ...
The cloud market growth today is largely in public clouds. While there is a lot of spend in IT departments in virtualization, these aren’t yet translating into a true “cloud” experience within the enterprise. What is stopping the growth of the “private cloud” market? In his general session at 18th Cloud Expo, Nara Rajagopalan, CEO of Accelerite, explored the challenges in deploying, managing, and getting adoption for a private cloud within an enterprise. What are the key differences between wh...
Ixia (Nasdaq: XXIA) has announced that NoviFlow Inc.has deployed IxNetwork® to validate the company’s designs and accelerate the delivery of its proven, reliable products. Based in Montréal, NoviFlow Inc. supports network carriers, hyperscale data center operators, and enterprises seeking greater network control and flexibility, network scalability, and the capacity to handle extremely large numbers of flows, while maintaining maximum network performance. To meet these requirements, NoviFlow in...
SaaS companies can greatly expand revenue potential by pushing beyond their own borders. The challenge is how to do this without degrading service quality. In his session at 18th Cloud Expo, Adam Rogers, Managing Director at Anexia, discussed how IaaS providers with a global presence and both virtual and dedicated infrastructure can help companies expand their service footprint with low “go-to-market” costs.
SYS-CON Events announced today that 910Telecom will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and ...
Enterprise networks are complex. Moreover, they were designed and deployed to meet a specific set of business requirements at a specific point in time. But, the adoption of cloud services, new business applications and intensifying security policies, among other factors, require IT organizations to continuously deploy configuration changes. Therefore, enterprises are looking for better ways to automate the management of their networks while still leveraging existing capabilities, optimizing perf...
SYS-CON Events announced today that Venafi, the Immune System for the Internet™ and the leading provider of Next Generation Trust Protection, will exhibit at @DevOpsSummit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Venafi is the Immune System for the Internet™ that protects the foundation of all cybersecurity – cryptographic keys and digital certificates – so they can’t be misused by bad guys in attacks...
Redis is not only the fastest database, but it is the most popular among the new wave of databases running in containers. Redis speeds up just about every data interaction between your users or operational systems. In his session at 19th Cloud Expo, Dave Nielsen, Developer Advocate, Redis Labs, will share the functions and data structures used to solve everyday use cases that are driving Redis' popularity.
SYS-CON Events announced today the Kubernetes and Google Container Engine Workshop, being held November 3, 2016, in conjunction with @DevOpsSummit at 19th Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA. This workshop led by Sebastian Scheele introduces participants to Kubernetes and Google Container Engine (GKE). Through a combination of instructor-led presentations, demonstrations, and hands-on labs, students learn the key concepts and practices for deploying and maintainin...
Ovum, a leading technology analyst firm, has published an in-depth report, Ovum Decision Matrix: Selecting a DevOps Release Management Solution, 2016–17. The report focuses on the automation aspects of DevOps, Release Management and compares solutions from the leading vendors.
"Avere Systems is a hybrid cloud solution provider. We have customers that want to use cloud storage and we have customers that want to take advantage of cloud compute," explained Rebecca Thompson, VP of Marketing at Avere Systems, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
"We formed Formation several years ago to really address the need for bring complete modernization and software-defined storage to the more classic private cloud marketplace," stated Mark Lewis, Chairman and CEO of Formation Data Systems, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
When it comes to cloud computing, the ability to turn massive amounts of compute cores on and off on demand sounds attractive to IT staff, who need to manage peaks and valleys in user activity. With cloud bursting, the majority of the data can stay on premises while tapping into compute from public cloud providers, reducing risk and minimizing need to move large files. In his session at 18th Cloud Expo, Scott Jeschonek, Director of Product Management at Avere Systems, discussed the IT and busin...
Large scale deployments present unique planning challenges, system commissioning hurdles between IT and OT and demand careful system hand-off orchestration. In his session at @ThingsExpo, Jeff Smith, Senior Director and a founding member of Incenergy, will discuss some of the key tactics to ensure delivery success based on his experience of the last two years deploying Industrial IoT systems across four continents.
There will be new vendors providing applications, middleware, and connected devices to support the thriving IoT ecosystem. This essentially means that electronic device manufacturers will also be in the software business. Many will be new to building embedded software or robust software. This creates an increased importance on software quality, particularly within the Industrial Internet of Things where business-critical applications are becoming dependent on products controlled by software. Qua...