Welcome!

@CloudExpo Authors: Fouad Khalil, David H Deans, PagerDuty Blog, Pat Romanski, Elizabeth White

Related Topics: @CloudExpo, IoT User Interface

@CloudExpo: Article

Integrated Cloud-based Load Testing and Performance Management

New Integration from Keynote and dynaTrace

Load Testing has traditionally been done In-House with load-testing tools using machines in your test center to generate HTTP traffic against the application needing to be tested for high volume transactions. With agile development practices, shorter release cycles and higher number of users that will ultimately access a web application from more places around the world, in-house testing reached its limits. Maintaining a load-testing infrastructure that supports 10 or 100 thousands of users becomes costly. With rapidly changing applications, updating test scripts is also becoming a bigger challenge all the time, binding lots of test resources to just the task of maintaining test scripts. When running tests more frequently, analyzing test results becomes a task that consumes performance architects or engineers with analyzing graphs and log files in order to figure out what problems were just uncovered by the recent test.

Let’s summarize these problems/requirements:

  • It is important to run bigger loads than ever before as our apps are accessed by more users around the globe
  • Besides just running the load we want to know how end-user performance is perceived from different locations around the globe
  • It is costly to own and maintain a test environment large enough to support these loads
  • It is time consuming to constantly adapt test scripts to reflect the changes of every product iteration
  • It takes experienced performance engineers or architects too long to analyze the test results and identify the root cause of problems

Cloud-Based Load-Testing with integrated Application Performance Management
Cloud based Load Testing solves many of these problems by providing high-volume tests from around the globe at specific times at a manageable cost. But it comes with some requirements on the tested application, and services like this must meet certain requirements in order to solve the discussed problems:

  • The Application Under Test (AUT) must be accessible from the internet as the generated transactions are generated from machines around the globe and not within the local test environment. Companies usually use part of their production system at “off-hours” to host the version of the application to be tested. This allows running large scale tests without having to have a replicate of the production environment for testing
  • The load testing service must provide an easy way to create and update scripts to adapt to changes within a product’s iterations. Otherwise too much time and effort is put into setting up tests.
  • The service must integrate with performance management software that runs on the tested application. This allows correlating data shown in load testing reports (Response Times, Transaction Rates, Bandwidth Usage …) with data captured in the application infrastructure (Transaction Times, CPU, Memory, Exceptions, …)

Proof of Concept: Load Testing with Keynote integrated with Application Performance Management from dynaTrace
Together with Keynote’s Load Testing Consultants we set up the following environment showcasing the benefits of an integrated solution of Cloud-Based Load Testing and Application Performance Management.

Step 1: Deploying the application
We deployed a 4 tier (2 Java and 2 .NET Runtimes) eCommerce Travel Portal on a hosted virtual infrastructure so that it is accessible by the Cloud-based Load Testing Service. We also installed and configured dynaTrace to manage this multi-tier heterogeneous application in order to identify problems once we put load on the system.

Application Dependency between the 4 tier heterogenous application

Application Dependency between the 4 tier heterogeneous application

Step 2: Test Scripts and Keynote/dynaTrace Integration
Keynote modeled several use-case scenarios based on the testing requirements we had on our application. We ended up with use cases such as executing a specific search, accessing the last-minute offers page or purchasing a trip. dynaTrace provides an integration interface for Load-Testing and Monitoring Services that allows us to link every executed synthetic request with the transaction that dynaTrace traces on the application server when these requests are handled by the application.

Step 3: Running a test
We decided to run a test starting with increasing load to figure out where the breaking point of our application is. We started with a load of 3000 sessions per hour running for 15 minutes and increasing this load every 15 minutes to 6k, 9, and 12k sessions/hour. It turned out our application broke much faster than we anticipated :-)

Step 4: Analyzing the Load Testing Report
When I log into the Keynote Load Testing Portal I start by looking at the load testing report that shows me the executed sessions, response times, page views and errors:

Keynote Load Testing Report showing an application problem when increasing the load to 6k sessions per hour

Keynote Load Testing Report showing an application problem when increasing the load to 6k sessions per hour

It is easy to see that – once we went from Phase 1 (3k sessions) to Phase 2 (6k sessions) – our application’s response times go through the roof causing most of the simulated users to experience timeouts. A click on the Page Error graph shows that these errors are mainly timeouts or connection errors. The question now is: Is this problem an application problem or is it related to the infrastructure? Without having insight to the application these results could be interpreted in multiple different ways, e.g: our hosting company doesn’t provide enough bandwidth. That is the point when Application Performance Management helps answering these uncertainties.

Step 5: Looking at application performance data
I’ve created two dashboards that I use to analyze application performance while or after running a load test. The first one is an Infrastructure Dashboard where I display CPU and Memory Utilization of all 4 Application Runtimes that are involved:

The Dashboard shows us high memory und GC activity on our GoSpaceBackend which also leads to very high CPU Utilization

The Dashboard shows us high memory und GC activity on our GoSpaceBackend which also leads to very high CPU Utilization

The red measure in the JVM Memory Usage graph indicates GC Collection time. The red in the CPU Usage indicates the max CPU Usage of that JVM. The conclusion is therefore easy. High memory usage leads to high GC activity which maxes out our CPU.

The next Dashboard gives me insight into the application itself – with all the involved application layers and the individual transactions that dynaTrace analyzed coming from the Keynote Load Test:

Transaction Response Time on the Application Server and Breakdown into Application Layers proofs that the slow response times are application related

Transaction Response Time on the Application Server and Breakdown into Application Layers proves that the slow response times are application-related

On the left of the Dashboard I placed a transaction overview of the individual use cases Keynote executed during the load test. It is easy to spot that once the load got ramped up to 6k sessions we saw a dramatic increase in response time on our application server. That means that our first question is answered: it is not an infrastructure problem with our web hosting but an application-specific problem. With the knowledge we already have by looking at the memory and CPU measures we can already guess that this is the main contributor. The performance breakdown on the bottom right also highlights which application layers were contributing the most to the application transaction response time. A double click on that graph gives us a close-up on this data:

Our Persistence Layer, EJBs and JMS are the main contributors of the application performance

Our Persistence Layer, EJBs and JMS are the main contributors of the application performance

Step 6: Drilling deeper into the problem
dynaTrace captured every single request that was executed while running the load test. Its PurePath technology is the enabler of the dashboards we looked at earlier. The next step is to identify what is really going on in the application and where is the main impact of the increased load. The next dashboard I created gives me a better overview of the application architecture, showing me which methods are called most often and how well they execute. I am also interested in database activity as well as individual web requests that were slow:

Detailed Overview of where my application hotspots are including application layers, methods, web requests and database statements

Detailed Overview of where my application hot spots are including application layers, methods, web requests and database statements

The dashboard again shows us that the primary application layer impacted is our persistence layer. It is also very interesting that the slowest URL is a web service hosted by our back-end application server and that we have a very high number of database statements coming from only a few web requests. This information is really valuable for the application architects who need insight into application dynamics under heavy load.

Step 7: Show me the root cause of these slow-running transactions
Not only can we get an overview of which requests were slow and how many methods or database statements were executed. We can now look into individual transactions, and also compare transactions to see where the difference is between a slow-running and a fast-running transaction. dynaTrace allows me to drill down to those 718 transactions that executed the slow running web service and I can inspect each individually:

All transactions (PurePaths) available for analysis. Selecting the slowest shows me where this web service got called and where time was spent

All transactions (PurePaths) available for analysis. Selecting the slowest shows me where this web service got called and where time was spent

Looking at the duration, CPU duration and Suspension Duration (Garbage Collection) really highlights the problem that we have. Suspension Time is really high with those transactions impacting the overall execution time.

I can also pick one that ran very slow and one that ran fast, and let dynaTrace compare these two transactions for me and highlight the differences:

Comparison shows the structural and timing difference between two transactions making it easy to spot the actual differences

Comparison shows the structural and timing difference between two transactions making it easy to spot the actual differences

Not only do I see how Garbage Collection impacts execution time of individual methods and the overall transaction. It also shows me how different the same transaction executes in case of an error (such as thrown abort exception) – which brings me to one additional dashboard I like to look it. This one includes exceptions, logging messages and an overview on the Garbage Collection runs on individual methods:

Exceptions including full stack traces, log messages with the context of where the were logged and an overview of suspended methods

Exceptions including full stack traces, log messages with the context of where they were logged and an overview of suspended methods

Step 8: Hand off the data
Looking at this data was easy as I simply look at these dashboards after the load test is finished. The dashboards already helped identifying several hot spots, e.g: high memory consumption by the back-end web services causing high GC, too many SQL statements per request, many hidden exceptions that never made it to a proper log message, …

dynaTrace makes this captured data available to the engineering team in order to resolve these problems. They can either access the data by directly accessing the dynaTrace environment used to capture this information. Another way is to export individual PurePaths or maybe all of them into a dynaTrace Session file which can be exchanged via email, Instant Messenger or attached to a bug ticket.

Proved the Concept: Cloud Based Load Testing with APM is ready for Agile Development

The problems/requirements listed in the beginning of this blog are solved/met with the integrated solution from Keynote and dynaTrace:

  • Keynote runs large scale load tests by driving load from many different locations around the globe
  • The global-distributed load generation allows us to identify local content delivery problems (slow network connections, wrongly configured CDNs, …)
  • The costs are under control as you only pay for the load test but don’t pay for maintaining your own load-testing infrastructure that would sit idle most of the time
  • Keynote makes it easy to create scripts and offers services to do the scripting for you
  • dynaTrace automatically highlights the problems identified during the load test. High-level analysis through dashboards doesn’t require highly skilled performance architects. The fine-grained data captured, however, gives the performance engineers and software architects actionable data without digging through log files or manually correlating a multitude of different performance metrics
  • No change to your application is required to use this integration

Related reading:

  1. End-to-End Monitoring and Load Testing with Keynote and dynaTrace We’ve learned from recent studies that performance has a direct...
  2. VS2010 Load Testing for Distributed and Heterogeneous Applications powered by dynaTrace Visual Studio 2010 is almost here – Microsoft just released...
  3. Performance Analysis in Load Testing Collection diagnostics information in Load Testing is a challenging task....
  4. From Cloud Monitoring to Effective Cloud Management – Webinar with IntraLinks on July 15th 2010 I am hosting a Webinar with IntraLinks this Wednesday. The...
  5. Elevating Web- and Load-Testing with MicroFocus SilkPerformer Diagnostics powered by dynaTrace MicroFocus and dynaTrace recently announced “SilkPerformer Assurance” and with that...

More Stories By Andreas Grabner

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

@CloudExpo Stories
DevOps at Cloud Expo – being held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA – announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises – and delivering real results. Am...
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - comp...
DevOps at Cloud Expo, taking place Nov 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long dev...
As the world moves toward more DevOps and Microservices, application deployment to the cloud ought to become a lot simpler. The Microservices architecture, which is the basis of many new age distributed systems such as OpenStack, NetFlix and so on, is at the heart of Cloud Foundry - a complete developer-oriented Platform as a Service (PaaS) that is IaaS agnostic and supports vCloud, OpenStack and AWS. Serverless computing is revolutionizing computing. In his session at 19th Cloud Expo, Raghav...
Almost two-thirds of companies either have or soon will have IoT as the backbone of their business in 2016. However, IoT is far more complex than most firms expected. How can you not get trapped in the pitfalls? In his session at @ThingsExpo, Tony Shan, a renowned visionary and thought leader, will introduce a holistic method of IoTification, which is the process of IoTifying the existing technology and business models to adopt and leverage IoT. He will drill down to the components in this fra...
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, will discuss the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
There is growing need for data-driven applications and the need for digital platforms to build these apps. In his session at 19th Cloud Expo, Muddu Sudhakar, VP and GM of Security & IoT at Splunk, will cover different PaaS solutions and Big Data platforms that are available to build applications. In addition, AI and machine learning are creating new requirements that developers need in the building of next-gen apps. The next-generation digital platforms have some of the past platform needs a...
19th Cloud Expo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterpri...
Enterprises have forever faced challenges surrounding the sharing of their intellectual property. Emerging cloud adoption has made it more compelling for enterprises to digitize their content, making them available over a wide variety of devices across the Internet. In his session at 19th Cloud Expo, Santosh Ahuja, Director of Architecture at Impiger Technologies, will introduce various mechanisms provided by cloud service providers today to manage and share digital content in a secure manner....
Fact: storage performance problems have only gotten more complicated, as applications not only have become largely virtualized, but also have moved to cloud-based infrastructures. Storage performance in virtualized environments isn’t just about IOPS anymore. Instead, you need to guarantee performance for individual VMs, helping applications maintain performance as the number of VMs continues to go up in real time. In his session at Cloud Expo, Dhiraj Sehgal, Product and Marketing at Tintri, wil...
SYS-CON Events announced today that Hitrons Solutions will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Hitrons Solutions Inc. is distributor in the North American market for unique products and services of small and medium-size businesses, including cloud services and solutions, SEO marketing platforms, and mobile applications.
SYS-CON Events announced today that eCube Systems, a leading provider of middleware modernization, integration, and management solutions, will exhibit at @DevOpsSummit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. eCube Systems offers a family of middleware evolution products and services that maximize return on technology investment by leveraging existing technical equity to meet evolving business needs. ...
SYS-CON Events announced today Telecom Reseller has been named “Media Sponsor” of SYS-CON's 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
Pulzze Systems was happy to participate in such a premier event and thankful to be receiving the winning investment and global network support from G-Startup Worldwide. It is an exciting time for Pulzze to showcase the effectiveness of innovative technologies and enable them to make the world smarter and better. The reputable contest is held to identify promising startups around the globe that are assured to change the world through their innovative products and disruptive technologies. There w...
SYS-CON Events announced today that Isomorphic Software will exhibit at DevOps Summit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Isomorphic Software provides the SmartClient HTML5/AJAX platform, the most advanced technology for building rich, cutting-edge enterprise web applications for desktop and mobile. SmartClient combines the productivity and performance of traditional desktop software with the simp...
With so much going on in this space you could be forgiven for thinking you were always working with yesterday’s technologies. So much change, so quickly. What do you do if you have to build a solution from the ground up that is expected to live in the field for at least 5-10 years? This is the challenge we faced when we looked to refresh our existing 10-year-old custom hardware stack to measure the fullness of trash cans and compactors.
Extreme Computing is the ability to leverage highly performant infrastructure and software to accelerate Big Data, machine learning, HPC, and Enterprise applications. High IOPS Storage, low-latency networks, in-memory databases, GPUs and other parallel accelerators are being used to achieve faster results and help businesses make better decisions. In his session at 18th Cloud Expo, Michael O'Neill, Strategic Business Development at NVIDIA, focused on some of the unique ways extreme computing is...
The emerging Internet of Everything creates tremendous new opportunities for customer engagement and business model innovation. However, enterprises must overcome a number of critical challenges to bring these new solutions to market. In his session at @ThingsExpo, Michael Martin, CTO/CIO at nfrastructure, outlined these key challenges and recommended approaches for overcoming them to achieve speed and agility in the design, development and implementation of Internet of Everything solutions wi...
Cloud computing is being adopted in one form or another by 94% of enterprises today. Tens of billions of new devices are being connected to The Internet of Things. And Big Data is driving this bus. An exponential increase is expected in the amount of information being processed, managed, analyzed, and acted upon by enterprise IT. This amazing is not part of some distant future - it is happening today. One report shows a 650% increase in enterprise data by 2020. Other estimates are even higher....
With over 720 million Internet users and 40–50% CAGR, the Chinese Cloud Computing market has been booming. When talking about cloud computing, what are the Chinese users of cloud thinking about? What is the most powerful force that can push them to make the buying decision? How to tap into them? In his session at 18th Cloud Expo, Yu Hao, CEO and co-founder of SpeedyCloud, answered these questions and discussed the results of SpeedyCloud’s survey.