Recent Comments
Affiliate Links
« The Nine Circles of Data Quality Hell | Main | Data Gazers »
Wednesday
20May2009

Schrödinger's Data Quality

In 1935, Austrian physicist Erwin Schrödinger described a now famous thought experiment where:

  “A cat, a flask containing poison, a tiny bit of radioactive substance and a Geiger counter are placed into a sealed box for one hour.  If the Geiger counter doesn't detect radiation, then nothing happens and the cat lives.  However if radiation is detected, then the flask is shattered, releasing the poison which kills the cat.  According to the Copenhagen interpretation of quantum mechanics, until the box is opened, the cat is simultaneously alive and dead.  Yet, once you open the box, the cat will either be alive or dead, not a mixture of alive and dead.” 

This was only a thought experiment.  Therefore, no actual cat was harmed. 

This paradox of quantum physics, known as Schrödinger's Cat, poses the question:

  “When does a quantum system stop existing as a mixture of states and become one or the other?”

 

Unfortunately, data quality projects are not thought experiments.  They are complex, time consuming and expensive enterprise initiatives.  Typically, a data quality tool is purchased, expert consultants are hired to supplement staffing, production data is copied to a development server and the project begins.  Until it is completed and the new system goes live, the project is a potential success or failure.  Yet, once the new system starts being used, the project will become either a success or failure.

This paradox, which I refer to as Schrödinger's Data Quality, poses the question:

  “When does a data quality project stop existing as potential success or failure and become one or the other?”

 

Data quality projects should begin with the parallel and complementary efforts of drafting the business requirements while also performing a data quality assessment, which can help you:

  • Verify data matches the metadata that describes it
  • Identify potential missing, invalid and default values
  • Prepare meaningful questions for subject matter experts
  • Understand how data is being used
  • Prioritize critical data errors
  • Evaluate potential ROI of data quality improvements
  • Define data quality standards
  • Reveal undocumented business rules
  • Review and refine the business requirements
  • Provide realistic estimates for development, testing and implementation

Therefore, the data quality assessment assists with aligning perception with reality and gets the project off to a good start by providing a clear direction and a working definition of success.

 

However, a common mistake is to view the data quality assessment as a one-time event that ends when development begins. 

 

Projects should perform iterative data quality assessments throughout the entire development lifecycle, which can help you:

  • Gain a data-centric view of the project's overall progress
  • Build data quality monitoring functionality into the new system
  • Promote data-driven development
  • Enable more effective unit testing
  • Perform impact analysis on requested enhancements (i.e. scope creep)
  • Record regression cases for testing modifications
  • Identify data exceptions that require suspension for manual review and correction
  • Facilitate early feedback from the user community
  • Correct problems that could undermine user acceptance
  • Increase user confidence that the new system will meet their needs

 

If you wait until the end of the project to learn if you have succeeded or failed, then you treat data quality like a game of chance.

And to paraphrase Albert Einstein:

  “Do not play dice with data quality.”


PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (3)

Another great Einstein quote:

We can't solve problems by using the same kind of thinking we used when we created them.

which reminds me of another fabulous quote by Jerry Weinberg:

If you use the same recipe, you get the same bread.

May 20, 2009 | Unregistered CommenterKaren Lopez

Over on the SmartData Collective, Henrik Liliendahl Sørensen commented:

One competitor to the Copenhagen Interpretation is the Many Worlds Interpretation.

In the Many Worlds Interpretation, both alive and dead states of the cat persist, but are decoherent from each other. In other words, when the box is opened, that part of the universe containing the observer and cat is split into two separate universes, one containing an observer looking at a box with a dead cat, one containing an observer looking at a box with a live cat.

The goal of data quality improvement is often set as: “Fit for purpose”. This definition allows for many worlds to exist, since many purposes may exist.

But when striving for optimal data quality in multi-purpose environments we always end up trying to make a perfect picture of the real world with our data.

As every enterprise then is trying to hold accurately the same data as everyone else in a smaller or larger part of the datasets, it seems that enterprises may take a shortcut by dividing the data assets into pots as:

• Data you are sharing with everyone else - or everyone in your industry (external reference data)

• Data you are holding for your particular purposes and don’t want to share with anyone else


And I responded:


Thanks for contributing your knowledge and insight on not only data quality but quantum physics, in which I have always found great analogies.

I agree with your excellent use of the Many Worlds Interpretation since the perception of the observer (user) plays a critical role in both quantum physics and data quality.

To paraphrase Gary Zukav from The Dancing Wu Li Masters:

Without perception, the data continues to generate an endless profusion of purposes. The effect of perception is immediate and dramatic. Only the user knows what causes a particular purpose to actualize and the rest to vanish. What we perceive to be data quality is actually our re-presentation of data quality fit for our purpose.

Perhaps, we should start using the term Quantum Quality?

May 21, 2009 | Registered CommenterJim Harris

Over on the SmartData Collective, Darryl Parker commented:

"Wouldn't taking an agile, or phased, approach to DB development allow the development team to answer the question more frequently?"

And I responded:

Yes, I definitely agree that In order to be successful, data quality projects must always be understood as an iterative process.

I blogged about this in my post: The Data Quality Goldilocks Zone.

The only nuanced point I would make on your comment is that the questions being answered are not from the development team – the development team asks the questions and the answers must come from the business team, from the users closest to the data who know what it really means from a day to day business perspective.

Building data quality monitoring functionality into the new system during the first iteration is necessary to facilitate this data-driven and user-driven capability throughout the entire agile software development lifecycle.

May 24, 2009 | Registered CommenterJim Harris

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>