@CloudExpo Authors: Liz McMillan, Yeshim Deniz, Elizabeth White, Zakia Bouachraoui, Pat Romanski

Related Topics: @CloudExpo, Java IoT, @DXWorldExpo

@CloudExpo: Blog Post

Test Data Management and the Cloud By @EFeatherston | @CloudExpo #Cloud

The goal of any good test data management process is to provide consistent, repeatable test data across systems and environments

Test Data Management and the Cloud - Keeping All the Plates Spinning

I was recently in Boston at Faneuil Hall Marketplace, and with the long-awaited warm weather, all the street entertainers were in full force. Singers, musicians, and a variety of juggling acts filled the street, with crowds surrounding them. One act in particular struck a chord with me - the classic spinning plates. We've all seen it at various times in our lives. The entertainer started spinning plates, balanced precariously on top of wooden sticks. More and more plates started spinning with the entertainer frantically running back and forth as one started to slow down, almost fall, but, just in time, was able to get it spinning and balanced again. Then the entertainer reached his limit: he add one more plate and as he tried to keep them all up, one lone plate, down on the end started to wobble, the stick tilting, and before the entertainer could reach it, the plate went crashing to the ground, taking several of the other plates with it.

For those who have been responsible for test data management in a large, complex, integrated environment, they can probably relate to the spinning plates challenge. Any Quality Assurance tester or developer responsible for chasing down a Severity 1 blocking bug, only to find the issue was not the code, but a flaw in the test data, can also relate. Identifying, configuring, deploying, and maintaining a valid set of test data remains one of the technology challenges that is the bane of many a technologist. How does the cloud impact this? Does it make it better, worse, or more of the same?

Why is test data management so hard?
The goal of any good test data management process is to provide consistent, repeatable test data across your systems and environments, whether it be development, QA, or performance. Ideally, it would be wonderful to have reusable test data sets to leverage across all environments. This would provide consistency, as well as resource and time savings. Sounds basic enough, so what makes it so hard?

There are multiple challenges:

  • Avoiding data collisions: For complex systems that integrate with other systems, test environments and systems tend to be shared due to cost and resource constraints. There are other applications testing against that same system you are integrating with. Coordinating data sets to ensure no other application under test is accidentally using and overwriting data you are using can be challenging and addressed. There is nothing worse than chasing what appears to be a bug that actually turns out to be someone else overwriting your test data.
  • Enforcing privacy rules: A common and useful practice is to mine and extract test data from production systems. The key consideration here is any privacy and compliance rules (such as HIPAA). This may require the masking of test data. Masking itself may then introduce other challenges. A simple example: part of your test data is a customer's name and address, which you need to mask. What if your system does address validation, to ensure all addresses are valid? You could easily create an address that now fails basic validation.
  • Ensuring relational integrity across systems: If you are integrating data sets across multiple systems, you may need to ensure you are maintaining the relational integrity of you data across those systems. The masking mentioned above can add to complications of that process that need to also be considered.
  • Resetting data set to a clean starting point: This means you need to understand any changes your testing did across all the integrated systems in order to be sure those changes can be backed out and/or removed back to a known starting point. Changes propagated across environments can be a key source of unintended consequences in a test environment.

How does the cloud impact all this?
All of the previous challenges discussed still exist when you move to the cloud environment. One of my favorite mantras is ‘no technology negates the need for good design and planning.' Cloud doesn't provide any magic; it's just a tool. It can help in standing up standard repeatable test environments, but the data setup process is still subject to the challenges already discussed.

Additionally, going to the cloud may introduce other challenges that must be considered:

  • SaaS solutions: In SaaS environments, you may not have direct access to the database layer. You are constrained to the mechanisms provided by the SaaS vendors for the extraction and the import of your user data, content, and configuration information. Your test data management process needs to take this into account.
  • Network bandwidth: If part or all of your environments reside in the cloud, you need to take into account the network when doing data loads, especially if you are dealing with large volumes of data either for performance testing or analytics. Bandwidth is usually well thought out for daily operational traffic, but frequently forgotten for initial and test data loads.

Keeping all those plates spinning is no easy task
Test Data Management has always been a challenge. Going to the cloud does not make it any easier. In fact, it adds some additional plates you need to keep spinning in order to ensure successful testing of your applications. As technologists, it's important to be sure we know which plates we need, and keep them close so we can keep them spinning. With good design and planning, there is no reason to think the test data management plates are going to come crashing to the ground.

This post is brought to you by The CIO Agenda.

KPMG LLP is a Delaware limited liability partnership and is the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative ("KPMG International"), a Swiss entity. The KPMG name, logo and "cutting through complexity" are registered trademarks or trademarks of KPMG International. The views and opinions expressed herein are those of the authors and do not necessarily represent the views and opinions of KPMG LLP.

More Stories By Ed Featherston

Ed Featherston is VP, Principal Architect at Cloud Technology Partners. He brings 35 years of technology experience in designing, building, and implementing large complex solutions. He has significant expertise in systems integration, Internet/intranet, and cloud technologies. He has delivered projects in various industries, including financial services, pharmacy, government and retail.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

CloudEXPO Stories
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buyers learn their thoughts on their experience.
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at Dice, he takes a metrics-driven approach to management. His experience in building and managing high performance teams was built throughout his experience at Oracle, Sun Microsystems and SocialEkwity.
In this presentation, you will learn first hand what works and what doesn't while architecting and deploying OpenStack. Some of the topics will include:- best practices for creating repeatable deployments of OpenStack- multi-site considerations- how to customize OpenStack to integrate with your existing systems and security best practices.
Transformation Abstract Encryption and privacy in the cloud is a daunting yet essential task for both security practitioners and application developers, especially as applications continue moving to the cloud at an exponential rate. What are some best practices and processes for enterprises to follow that balance both security and ease of use requirements? What technologies are available to empower enterprises with code, data and key protection from cloud providers, system administrators, insiders, government compulsion, and network hackers? Join Ambuj Kumar (CEO, Fortanix) to discuss best practices and technologies for enterprises to securely transition to a multi-cloud hybrid world.
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the benefits of the cloud without losing performance as containers become the new paradigm.