Schrödinger's Data Quality

In 1935, Austrian physicist Erwin Schrödinger described a now famous thought experiment where:

“A cat, a flask containing poison, a tiny bit of radioactive substance and a Geiger counter are placed into a sealed box for one hour. If the Geiger counter doesn't detect radiation, then nothing happens and the cat lives. However if radiation is detected, then the flask is shattered, releasing the poison which kills the cat. According to the Copenhagen interpretation of quantum mechanics, until the box is opened, the cat is simultaneously alive and dead. Yet, once you open the box, the cat will either be alive or dead, not a mixture of alive and dead.”

This was only a thought experiment. Therefore, no actual cat was harmed.

This paradox of quantum physics, known as Schrödinger's Cat, poses the question:

“When does a quantum system stop existing as a mixture of states and become one or the other?”

Unfortunately, data quality projects are not thought experiments. They are complex, time consuming and expensive enterprise initiatives. Typically, a data quality tool is purchased, expert consultants are hired to supplement staffing, production data is copied to a development server and the project begins. Until it is completed and the new system goes live, the project is a potential success or failure. Yet, once the new system starts being used, the project will become either a success or failure.

This paradox, which I refer to as Schrödinger's Data Quality, poses the question:

“When does a data quality project stop existing as potential success or failure and become one or the other?”

Data quality projects should begin with the parallel and complementary efforts of drafting the business requirements while also performing a data quality assessment, which can help you:

Verify data matches the metadata that describes it
Identify potential missing, invalid and default values
Prepare meaningful questions for subject matter experts
Understand how data is being used
Prioritize critical data errors
Evaluate potential ROI of data quality improvements
Define data quality standards
Reveal undocumented business rules
Review and refine the business requirements
Provide realistic estimates for development, testing and implementation

Therefore, the data quality assessment assists with aligning perception with reality and gets the project off to a good start by providing a clear direction and a working definition of success.

However, a common mistake is to view the data quality assessment as a one-time event that ends when development begins.

Projects should perform iterative data quality assessments throughout the entire development lifecycle, which can help you:

Gain a data-centric view of the project's overall progress
Build data quality monitoring functionality into the new system
Promote data-driven development
Enable more effective unit testing
Perform impact analysis on requested enhancements (i.e. scope creep)
Record regression cases for testing modifications
Identify data exceptions that require suspension for manual review and correction
Facilitate early feedback from the user community
Correct problems that could undermine user acceptance
Increase user confidence that the new system will meet their needs

If you wait until the end of the project to learn if you have succeeded or failed, then you treat data quality like a game of chance.

And to paraphrase Albert Einstein:

“Do not play dice with data quality.”

OCDQ Blog

OCDQ Blog

OCDQ Blog

OCDQ Blog

Schrödinger's Data Quality

OCDQ Blog