by Jessica Paddock
Courtesy of App Developer Magazine

copperIn today’s threat-laden environment, where production data is one of the chief targets of hackers, organizations developing software must expend both time and resources securing their production data. One of the simplest ways to ensure security for software testing activities is through the use of targeted, advanced data-handling solutions that can synthesize and virtualize production data.

Such a project is not without cost and effort, of course, but the result is well worth the investment. Furthermore, organizations can extend the value of such an endeavor by using it as the cornerstone of a comprehensive test data management (TDM) improvement project.

I recently worked on a project where we did exactly that. Initially, we were hired to assist with defect reduction, but our solution also enabled the organization to achieve security for its QA data. After we selected the tools and did a proof of concept, within fewer than four weeks’ time we were 100% productive creating personally identifiable information (PII)-free data for globally disparate teams. Defects were soon reduced by 95%—without ever exposing a single bit of PII to anyone.

One of the most fundamental problems with this project—and the target of the aggregate solution—was how data had been organized and then extracted and provided to teams for testing. Here, I’ll offer solution insights from that phase of the project to inspire business leaders to initiate their own TDM improvement efforts. In a separate sidebar, “Steps to Success with TDM,” I will offer more general suggestions for testers and test data engineers that will help organizations strengthen their TDM programs, as a whole.

Ensuring Functional Data Design and Efficient, Secure Access

In this particular company, TDM engineers were using Microsoft Excel spreadsheets, rather than databases, because they didn’t have the background to develop proper relational databases. The result was a mishmash of spreadsheets with macros and pivot tables to maintain the relationships. Using “flat file” solutions like these are time consuming, ineffectual, and certainly not secure.

To resolve this issue, we first worked with knowledgeable client personnel who helped us determine the proper relationships. Next, we used Grid Tools Data Maker (which is now a component of CA Test Data Manager), to create the data based on the business and technical requirements. In doing so, we were able to build true relational databases that would provide meaningful data for the team.

CA Test Data Manager and similar products use the data’s structure—tables, file types, data attributes, etc.—but do not require access to real production data for the synthesis. This keeps the original dataset and its PII untouched.

Finally, we virtualized the data using the Delphix Virtualization Engine, which compresses data and delivers compatible datasets on demand for testing (and other) needs. Within minutes, virtual data can be provisioned to any point in time. Virtual copies run on any target host server where physical environments run, yet they occupy less than one tenth of the storage space of the physical databases they represent. Administrators can update, manage, and configure policies for virtual copies through the central interface, ensuring that the right data is available to appropriate teams as needed.

Virtualizing data is also one of the most important steps to enhancing data security. Disseminating copies of production data across disparate teams and locations for testing and other purposes significantly increases the risk of unauthorized data access or outright theft. With synthesized, virtualized data, these risks are largely eliminated. (Organizations still must secure their core production datasets from unauthorized viewing, copying, and sharing, not only by testers but also by developers and other personnel. That process is beyond the scope of this article.)

How Everyone Wins

It has been proven that professional tools, paired with well-designed processes and procedures, are the surest way for software professionals to obtain valid, functional data for testing. For software organizations, those tools have also become one of the primary defenses against the data breaches that can literally cripple a firm.

When organizational leadership makes a commitment to implement a TDM program that incorporates automated data synthesis and data virtualization, defects plummet, development and testing costs drop, and data security is enhanced. The result is a stronger company, a better product, and greater satisfaction at both the software team and user level.

Steps to Success with TDM

1. Know and understand your data requirements. There is no shortcut, but the rewards are completely worth the effort. If you cannot figure this out, get outside help to identify test requirements.

2. Once you know your data requirements, submit clear, complete and accurate test data requests. The data you receive is only as good as the requirements you provide. This may seem hard and time-consuming at first, but over time it will come naturally. Again, the improvements are more than worth the effort.

3. Develop and use a template for data requests, which will make #2 easier and ensure consistency over time. Include all the questions needed to ensure the desired end result. The template can vary per company, project, and team.

Test Data Engineers
1. Use a tool to create the necessary data. Production data will typically provide only 10-20% of the data you need for testing, so development of data will supplement what’s missing from production.

2. Schedule the delivery of your data for the timeframe that best suits your needs. Database virtualization is a great way to do this.

3. Provide a copy of the data created, so testers can use it for their requests. This not only improves test quality; it dramatically reduces the time testers spend seeking data.

4. Make the process repeatable. The code to create data can be written to handle innumerable, future executions. Properly written code—with automation—makes data creation for subsequent project sprints simpler and easier.