@CloudExpo Authors: Liz McMillan, Elizabeth White, Pat Romanski, Gordon Haff, Yeshim Deniz

Related Topics: @CloudExpo, Java IoT, @DevOpsSummit

@CloudExpo: Blog Feed Post

Best Practices for Load Server Calibration | @CloudExpo #Cloud

Load server calibration: what it is and why you need to do it

Enough Is Enough!  Or Is It? Best Practices for Load Server Calibration
By Dan Boutin

Here’s a question I get asked all the time by customers and prospects alike: I use “that tool out west” and I can usually get 50 vUsers per load server for my current scripts/tests. How many vUsers can SOASTA get per load server?

My typical response: Do you currently use a load server calibration process?

The dead silence and blank stare I get in return usually means the person thinks they need a lab like the one above, or that they have no idea why you’d want to calibrate a load server, much less have an iterative process for doing it.

Here’s one reason why it’s important, and why I get asked this all the time:

I know of several large financial institutions that do all of their performance testing inside the firewall, and thus they own all of their own infrastructure, including dedicated servers that are used solely for load generation. With some of the large load generation requirements (which can be in the hundreds of thousands of vUsers), you can imagine that even a small bump in optimization of a load server could potentially save a company quite a bit of infrastructure costs in the load server hardware alone. Which is why, as part of our best practices, SOASTA advocates calibration of load servers when using CloudTest.

So, let’s walk though the process.

Load server calibration: What it is and why you need to do it
Load server calibration is an iterative process to accurately determine the appropriate number of virtual users to run from each load server.

Why calibrate? Two reasons:

  1. To identify the maximum number of users a load server can handle (per test clip/script) so computer resources are used in the most efficient way possible. If you are running very large tests where the number of load servers might be limited, you need to get as many users as possible on a load server. (Refer to my intro example as well.)
  2. Eliminate the load server as a potential bottleneck in the test. If the load server is overloaded, the first thing you will see in the results is an increase in average response time. If you don’t notice the load server is overloaded, you might incorrectly believe the target application is slowing down when in reality it is the load server itself.

So, how do you know when a load server is properly calibrated? Simple. Here are a few things to look for:

  • CPU — For cloud providers where instances are on shared hardware, the CPU should peak around 65-75% utilization. Average utilization should be around 65%. For bare-metal load servers, CPU utilization shouldn’t go beyond 90%.
  • Heap usage — The java heap usage and garbage collection needs to be “healthy” on the load servers. Heap usage should not increase as the text executes.
  • Network interface — No more than 90% utilization (i.e. 900-950Mbit/second).
  • Errors — No weird errors are seen during the test.

When should you perform a calibration test?

The calibration process is highly recommended for all tests, but especially in these situations:

  • Whenever a large test is being executed (i.e. over 10,000 users). This ensures that server resources are being used in the most efficient manner possible.
  • When your analysis of previous test results leads you to believe that the load servers might be getting in the way of the load test. As an example, maybe you’ve noticed the load servers spiking up to high levels of CPU usage in the CloudTest Monitor Dashboard. Maybe the target application doesn’t appear to be slowing down — despite metrics on the SOASTA dashboards that indicate an increase in average response time.

The calibration process explained
Depending on how many locations you plan to run the test from, how many test clips you have,and how long the testing session will last, the calibration process can take a varying amount of time. However, if done correctly, the output is a fairly precise number of virtual users that should be applied to each clip on each instance size on each cloud provider.

The number of virtual users that a given load server will support is determined by ramping up a test to a specified number of users over a 10-minute period. This relatively slow ramp (in most cases) will help identify when the key metrics on the load servers start to deviate from the norm.

What you need to know before you get started:

  • How many test clips will run during the test session?
  • How many hours will the test session last?
  • What cloud providers will be used for the test, if any?
  • Which instance sizes with each cloud provider will be used?

Make sure the target application can sustain the needed number of users against the site. As an example, let’s use 1,000 users as it’s likely more than the load generator can push (given an average test case and a load server consistent with an AWS Large instance). Lower levels will also work, but if you can get clearance for 1,000 that will ensure you have headroom to start high and see issues as you ramp up.

Step 1:  Test clip setup
The first step in the calibration process is making sure you have created all the test clips that will run during the test session. Ensure that all the think times are appropriate to the application (meaning, make sure the test clip is taking the correct amount of time to complete.) Then, make sure that all memory optimizations are complete in the test clip.

The two most important things are scopes and clearing the responses in scripts. Scopes should be set as ‘private’ unless local/public is needed for scripts. The clearResponse function should be used in any scripts (like validations or extractions) that access the response of a message.

Step 2:  Test environment setup

  1. Start a grid with 1 result server and 1 load server in the location you want to calibrate (for example, Rackspace OpenStack London).
  2. Normally load is generated from large-size instances, but if you have a specific reason to generate load from another instance size (and it needs to be calibrated separately), you will need to run the calibration process below with a separate grid that only has that instance.
  3. Once the grid starts, verify the monitoring of the grid started as well. This monitoring data provides critical information for the calibration The grid UI will confirm monitoring is started. You can also confirm monitoring started by opening the CloudTest monitor dashboard. If you see the load server and result server listed and capturing data, that means the monitoring was started correctly. The dashboard will look like this:


Where to look during the calibration test
There are five main areas to watch during the calibration test:

  • Test results. Keep an eye on two main metrics: average response time and errors. If the average response time starts to rise (which it likely will as you ramp up to 1,000 users), start looking at the monitoring data. Try to determine if the increase in response time is due to the load increasing or because the load servers are getting in the way. Also, watch the errors. If you start to see “weird” errors (HTTP timeout errors, connection timeout errors, odd SSL errors, etc.), that might be an indication the load servers are overloaded.
  • Default monitoring dashboard. This is the monitoring dashboard that shows monitoring data over time.  It is accessed from SOASTA Central by clicking on ‘Monitors’, finding the load server you just started, and clicking the ‘View Analytics’ link. A better option is if you create your own monitoring dashboard that shows these same metrics. This is the better option because when you start the composition, you will link to the monitoring. If you use your own dashboard, then you will get correlated metrics in the dashboards. This will be your primary monitoring dashboard during the test.
  • Linux ‘top’. This command gives you deeper information than the default CPU chart provides. Specifically, on EC2, it gives you information about how much CPU is being stolen by the hypervisor. Most times, EC2 steals CPU resources (%st), so the max CPU utilization you might see in the SOASTA dashboard is ~75% utilization. This means the CPU is maxed out if there is ~25%st shown in top. Before the test starts, SSH into the load server (assuming the load server is Linux-based) and type ‘top’ at the command line. Keep this window up throughout the calibration test.
  • CloudTest monitor dashboard. This is the dashboard mentioned above. It is less useful in this test as there is only a single load. This dashboard is most useful during large tests where lots of load servers and result servers are active. You can see all the metrics on a single dashboard.
  • Monitoring combined charts. This dashboard is very useful since it shows the different monitoring metrics against Virtual Users and Send Rate. The only thing to be aware of is that you need to add the monitoring to the composition before the test starts by going to Composition Properties -> Monitoring -> Enable Server Monitoring and check the monitor of the grid you want to save the monitoring data.

Step 3: Create and execute the test composition
Given that the goal is to identify as closely as possible where the load server starts to get overloaded, the longer the ramp-up time, the better. However, you have to be realistic too. You don’t have days to do this. So, in this example, a 10-minute ramp to 1,000 users seems to be adequate.

Setup the test composition as follows:

  1. On Track 1, put the test clip being calibrated
  2. Set the track to use a ‘dedicated load server’
  3. Set composition to “load” mode
  4. Set 1 load server
  5. Set 1,000 virtual users on that load server
  6. Set ramp-up time on the track to 10 minutes
  7. Set the track with Renew Parallel Repeats
  8. Go to Composition Properties -> Monitoring and check the checkbox for the ‘Test Servers’ monitor (note that this will change on each stop/start of the grid, so confirm this setting if you restart the grid)

Once the test composition is setup correctly, load and play the composition to start the calibration test.

How to know when the load server is overloaded: five metrics to watch
Identifying when a server is overloaded is as much an art as it is a science. The metrics discussed below will lead you in the right direction, but how conservative you might be has an impact as well. Whenever you think you’ve identified a point in the test where you believe the load server is overloaded, note the number of virtual users. Then let the test run a few minutes more. It’s possible the metric was temporarily out of line.

Another thing to keep in mind is that the load server might have been overloaded before the test reached the virtual user level you noted as a possible limit. Monitoring might have detected it after the fact.

Here are the five key metrics to watch:

1. Both the default monitoring dashboard and top
If % id in top is close to 0, or the CPU metric in the Default Monitoring dashboard is basically flatlining around 75%-80% CPU, note the number of virtual users the test is at. Likely the server is overloaded at this point.

2. Heap usage/garbage
This metric is just as important to watch as the overall CPU usage. Between these two metrics, you can normally figure out when the server is overloaded. The primary metric to watch is the “JVM Heap Usage” widget on the Default Monitoring dashboard. Make sure it goes down (i.e. Garbage collection) after periods of increase. If it keeps going up and doesn’t come down much, that means java can’t do garbage collection appropriately.   This also means the CPU is probably overloaded as it continually tries to do garbage collection. Be patient though: major garbage collections can occur after longer periods of time. CPU utilization will tell you if you have any chance of a decent garbage collection. If CPU is high and garbage collection isn’t happening, you have probably overloaded the load server.

Below is an example of healthy garbage collection. Note how the heap usage increases and garbage collection returns it to a proper level. The trend should be effectively flat.


Below is an example of unhealthy garbage collection. Note that some minor garbage collection is occurring, but no major garbage collection is occurring. This indicates the load server is overloaded.


3. Network interface
Watch the Default Monitoring dashboard. As the load increases, the amount of bandwidth used will increase. If the amount of bandwidth used approaches 950Mbits/second, the Network Interface is becoming a bottleneck in the test and the load server is overloaded. This is rarely a bottleneck with the load

4. Disk IO
Watch the Default Monitoring dashboard. There are two disk-related metrics that will tell you if heavy disk use is occurring. This is very unlikely to be a bottleneck with the SOASTA load servers since most processing on the servers is done in memory.

5. Test results
Watch error rates and average response time. If any of the metrics above are starting to become a bottleneck, likely the average response time will increase and/or error rates will climb rapidly, in a hockey stick-like fashion.

Important: Memory usage can be a misleading metric
Do not rely on this metric to determine if the load server is overloaded. Java will take up all the memory — but that doesn’t mean there is a memory limitation. JVM heap usage is the metric that should be used instead.

What to do when the load server becomes overloaded

When you determine the load server is overloaded, you need to stop the test composition and change the number of virtual users to the number of users you identified the bottleneck at. Here’s how:

  1. Let’s say you identified a bottleneck at 600 virtual users. Then set the composition virtual users to 600 users.
  2. Reset the ramp-time to 10 minutes.
  3. Go through the same process as described above.
  4. If you find that over time 600 users is still too high, then lower the number of virtual users on the load server and restart the test.
  5. If the test runs without any bottleneck, then you’ve pretty much confirmed the number of virtual users you can run on the load server.

What to do when you think the virtual user threshold has been identified

Once you find the number of virtual users that seems to work with the test clip, you need to run that test for as long as the test session is scheduled to last. If it is two hours, then run that same test for two hours. This will help flush out any long-term problems with the test. If the JVM heap usage continues to go up (toward the 6GB limit on EC2), it is possible the test might die when it approaches 6GB heap usage.

It isn’t always possible to run this longer test. If you have lots of test clips or a limited amount of time before the test, you might not be able to run the test clip for hours. If that’s the case, don’t fret. Take a look at the results you’ve gathered to this point. Do you think that the number of virtual users you’ve identified as your limit is aggressive or conservative? If you think you are on the edge and being too aggressive with the number of virtual users, then cut that number back a bit.

Again, this process is a combination of art and science. Go with your gut.

Next steps and considerations
Once you’ve completed this process for a single test clip, you will need to repeat this process for each cloud vendor and instance size you plan to run this test clip on. Additionally, you will need to repeat this process for each test clip.

One important thing to note about cloud providers is that every instance is not the same. Take Amazon EC2, for example. Not every “m1.large” instance is the same. Some have different processor types (Intel vs. AMD). Some have different processor speeds. You don’t have control over whether you get a faster m1.large or a slower m1.large. So it is possible that you might have calibrated your test on one of the faster instances. As a result, when you run the larger test, it is possible that some of your load might come from slower load servers. You won’t know if you randomly get fast or slow servers. Again, this is a time when you have to make a call about how conservative you want to be. Maybe you subtract 100 users from the total number of virtual users to account for this. Or, if you don’t think it will make a big difference, just let it go.

One way to do a simple validation of your virtual user number is to stop the grid and restart it. You will get another random server. If you repeat the calibration test 2-3 times with different load servers and get the same result, you can be confident you have the right number of virtual users.

What NOT to do: An alternative way to get more virtual users out of a given load server is by artificially extending the test clip. In essence, this means increasing the think times during the test. This is not a recommended approach as it artificially decreases the number of HTTP requests being sent to the target application and won’t properly simulate the expected number of virtual users.

Advanced calibration tips
If this is a large test that you are going to do often, and therefore you would like a more precise calibration, you can calibrate with the same variations of load, but do so by increasing the number of load generators in use at each step, rather than by changing the load on a single load generator. Keep each single load server to a low number of users (at a point where you are pretty sure that it is not at capacity). You then track the same response time, errors, and other metrics as you did using the single load generator.

By comparing your two curves, you can get an idea of where the true capacity limits are. For example, if the curves degrade but are identical, then that suggests that the target under test is degrading rapidly, and load generator capacity is hardly even a factor.

On the other hand, if you see degradation in the first step (one load generator), but the second step (multiple load generators) shows no degradation (the charts show a nice linear increase in the factors as load increases), then that suggests the target site is not degrading at all, and all of the degradation you saw in the first step was due to load generator capacity being reached.

You may see something in-between, because often both sides are degrading at different rates as the load increases. In that case, you need to make a judgment call by looking at the two curves.

Let me illustrate
An illustration of how this might be done in this case, based on the assumption that we’re testing a very robust site and based upon the observations made so far that we seem to be able to get a fairly large number of users per load generator (and in this example we’ll start with a lower VU number and work our way up):

Step one: Run tests on a single load generator, with test runs at 100, 200, 300 … 2000 users, or until you see CPU and memory limitations come into play, or until significant degradation occurs. Plot a graph of the factors like response time, error rate, and throughput. If you see no degradation in the graph (everything is linear and the error rate is constant) until a CPU or memory limit is reached, then you are done and can stop — you have found the capacity point and need go no further. Otherwise, go to step two.

Step two: Run the same tests with a constant 100 users per load generator, with 1, 2, 3, 4… 20 load generators, or until you reach the stopping point of step one. Plot a graph of the factors like response time, error rate, and throughput.

Step three: Compare the two graphs and reach a conclusion as to the capacity of load generator vs. the capacity of the site under test. If you can, make a judgment of where the capacity of the load generator is reached. (Note that if the site under test is itself degrading rapidly, you might not be able to reach a conclusion about the load generator capacity.)

Final takeaways
Whether you are driving load from the cloud or from your own internal load generators, calibration of your load generators is considered a SOASTA best practice. Certainly cost savings is one main reason — either for cloud infrastructure costs (pennies) or internal hardware costs (potentially hundreds of thousands of dollars).

(SOASTA CloudTest can drive load from either source, as well as drive load from internal and cloud-based load generators at the same time — in the same test.  But I’ll explain that in an upcoming post.)

Calibration is also key to ensuring that you have the most accurate performance test baseline point for each test run. After all, our goal is to test the website or application, not the load server! Still have questions? Leave a comment below or ping me on Twitter.

Related posts:

More Stories By SOASTA Blog

The SOASTA platform enables digital business owners to gain unprecedented and continuous performance insights into their real user experience on mobile and web devices in real time and at scale.

@CloudExpo Stories
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
DXWorldEXPO LLC announced today that Kevin Jackson joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Kevin L. Jackson is a globally recognized cloud computing expert and Founder/Author of the award winning "Cloud Musings" blog. Mr. Jackson has also been recognized as a "Top 100 Cybersecurity Influencer and Brand" by Onalytica (2015), a Huffington Post "Top 100 Cloud Computing Experts on Twitter" (2013) and a "Top 50 C...
Digital transformation is about embracing digital technologies into a company's culture to better connect with its customers, automate processes, create better tools, enter new markets, etc. Such a transformation requires continuous orchestration across teams and an environment based on open collaboration and daily experiments. In his session at 21st Cloud Expo, Alex Casalboni, Technical (Cloud) Evangelist at Cloud Academy, explored and discussed the most urgent unsolved challenges to achieve fu...
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
Predicting the future has never been more challenging - not because of the lack of data but because of the flood of ungoverned and risk laden information. Microsoft states that 2.5 exabytes of data are created every day. Expectations and reliance on data are being pushed to the limits, as demands around hybrid options continue to grow.
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
DXWorldEXPO LLC announced today that "Miami Blockchain Event by FinTechEXPO" has announced that its Call for Papers is now open. The two-day event will present 20 top Blockchain experts. All speaking inquiries which covers the following information can be submitted by email to [email protected] Financial enterprises in New York City, London, Singapore, and other world financial capitals are embracing a new generation of smart, automated FinTech that eliminates many cumbersome, slow, and expe...
Evan Kirstel is an internationally recognized thought leader and social media influencer in IoT (#1 in 2017), Cloud, Data Security (2016), Health Tech (#9 in 2017), Digital Health (#6 in 2016), B2B Marketing (#5 in 2015), AI, Smart Home, Digital (2017), IIoT (#1 in 2017) and Telecom/Wireless/5G. His connections are a "Who's Who" in these technologies, He is in the top 10 most mentioned/re-tweeted by CMOs and CIOs (2016) and have been recently named 5th most influential B2B marketeer in the US. H...
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of bus...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
Cloud Expo | DXWorld Expo have announced the conference tracks for Cloud Expo 2018. Cloud Expo will be held June 5-7, 2018, at the Javits Center in New York City, and November 6-8, 2018, at the Santa Clara Convention Center, Santa Clara, CA. Digital Transformation (DX) is a major focus with the introduction of DX Expo within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive ov...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
@DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises - and delivering real results.
The dynamic nature of the cloud means that change is a constant when it comes to modern cloud-based infrastructure. Delivering modern applications to end users, therefore, is a constantly shifting challenge. Delivery automation helps IT Ops teams ensure that apps are providing an optimal end user experience over hybrid-cloud and multi-cloud environments, no matter what the current state of the infrastructure is. To employ a delivery automation strategy that reflects your business rules, making r...
"We started a Master of Science in business analytics - that's the hot topic. We serve the business community around San Francisco so we educate the working professionals and this is where they all want to be," explained Judy Lee, Associate Professor and Department Chair at Golden Gate University, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.