Test Data Management: From Dysfunction to Function

By Jessica Paddock
Courtesy of App Developer Magazine

For many organizations, managing test data is one of the most difficult components of software development. With thousands (and sometimes millions) of data points available, teams struggle to identify the appropriate data for testing. I have worked with multi-billion dollar corporations that manage to keep their software systems operable despite this problem, but in doing so, they expend huge amounts of unnecessary time and human resource—and are forced to remediate numerous defects in production.

Many software professionals view test data management (TDM) as an issue of management tools and automation. Although these are important elements of TDM, in my experience the problems with test data begin at a much more fundamental level.

Obtaining the right data for test cases, at the right time, without straining testing teams, isn’t a difficult process, but it requires a logical, properly structured methodology. Frequently, company stakeholders think they cannot afford the time and effort to develop a functional methodology, when in reality, they are wasting much more of both by not doing that very thing.

Testers Aren’t Data Experts

One of the biggest problems I have noticed is that companies often have testers handling the role of TDM, in addition to their testing activities. Some testers have the skills to query a database and extract data, but they are not usually familiar enough with relational databases to understand what the background data model looks like and ensure they have identified the best data for testing. They spend enormous amounts of time—often 50% of their working hours, or more—extracting data that isn’t going to produce constructive tests.

I have even seen testers ignore the fact that the data they are gathering is defective. This happens, not because the testers don’t care about quality, but because they simply do not have the time and skill set to produce a better result. That’s understandable, because they shouldn’t be doing TDM in the first place. Until organizations have a separate, competent TDM team—either in-house or outsourced—and can put testers back to work doing their jobs, the problem won’t go away and quality will suffer.

Alternately, companies may have designated TDM staff, but they are relegated to the position of “order fillers,” and testers aren’t effectively communicating what they need, in terms of the data. I have seen testers send email messages to the TDM team asking for data in a very non-specific manner rather than providing a properly structured, useful data request.

The TDM team cannot produce what it isn’t asked to provide. The result is that either the test goes forward with the wrong data (with predictably inaccurate, poor-quality results), or more rounds of requests ensue, wasting everyone’s time and consuming the testing budget unnecessarily. At this level, TDM tools such as JIRA, which tracks data requests, are helpful but still cannot resolve the underlying problem.

Garbage In; Garbage Out

The root of the issue is that many companies have never taken the time to create and structure a system for ensuring good results. In addition to not having well-designed request forms and processes, they also don’t have properly developed data models and data dictionaries, all of which are fundamental to accurate data gathering.

When I consult with an organization that wants to develop a mature, functional TDM methodology, the first thing I ask for is the data models, data dictionaries, and test cases for all the areas they want to improve. It is disturbing how frequently I cannot obtain that information.

The employees working with me say, “We cannot do this; it takes forever.” These elements are the building blocks of TDM. They are the equivalent of the installation manuals that come with a product, yet for many software teams, they are either non-existent or improperly developed.

TDM is straightforward and simple when all its components—the data models, the data dictionary, and the data request documents—are developed according to best practices. When companies satisfy these requirements—and when they have designated team members creating the data using accurate data requests and working within a properly structured framework—the result is good data, delivered on a timely basis. Without all of these components in place, the result is longer testing cycles, defects in production, and an overall reduction in quality.

Then, when testing cycles become longer, or defects continue to escalate, company decision makers assume the testers aren’t doing their jobs, or that testing itself is a waste of time and money. In the worst case scenarios, the result is less effort and budget expended on testing.

This is a disastrous viewpoint that dooms software to failure and, of course, is completely inaccurate. TDM follows the old adage commonly assigned to computer cod—garbage in; garbage out. It’s that simple.

Making a Choice

Fixing the problem isn’t easy, fast, or cheap, but the alternative is to continue with defective software and waste even more time and money eliminating problems in production. (Another option is to intentionally allow software to underperform, but this course is a death sentence in a world where user expectations are irrationally high.)

I worked with one company that over an eight-week period couldn’t produce the requirements we needed to help them develop a TDM methodology. At the end of the eight weeks, we put the project on hold. A year later, when the pain of poor data, failed tests, and buggy software became excruciating, they finally turned the requirements gathering over to us.

We found that data was stored in more than 200 flat files, with no data model and no data dictionary. Team members had also removed all the relationships from the tables. We needed to talk to subject matter experts who were largely unavailable or didn’t know the answers to our questions about the data and the relationships. The process of collecting and organizing the necessary information—which the firm should have had already—took months and cost the organization a lot of money.

This is the point where you are probably thinking, “Okay, Jessica, give us the silver bullet.” I wish I had one, but unfortunately, a magical fix doesn’t exist. If your organization is in this predicament, there is only one way to resolve the problem, and it is an absolute requirement.

Focus, Not Hocus Pocus

As with so many software-related projects, executive management must fully support the effort and make it an immutable corporate priority. It must be a mandate for employees to gather and provide the necessary information (if a third-party consultant is helping to create the structure) or to organize and store data in an appropriate manner, and to develop data models and a data dictionary.

Someone must take the time to create a functional data request form and testers must use it as designed. The organization must have dedicated TDM staff, either in-house or outsourced, providing the right data to testers who spend their days actually testing.

The improvement process is incremental, but over time, once organizations begin following a functional TDM methodology, the situation improves. Teams have access to data as often as they need or want it, which speeds testing and reduces time to market overall. DevOps or operations personnel expend far less time and effort eradicating defects in production. The cost of software development drops dramatically and quality soars. Making productive use of other improvements, from TDM tools to an agile methodology, becomes feasible. Doesn’t that sound better?

More To Explore

Digibee Partners with Orasi

Article published in DEVOPSdigest. Digibee has joined Orasi’s partner community that brings enterprise customers full development and delivery lifecycle support for market-driven products. Through the

Read More »