The General Theory of Data Quality

In one of the famous 1905 Annus Mirabilis Papers On the Electrodynamics of Moving Bodies, Albert Einstein published what would later become known as his Special Theory of Relativity.

This theory introduced the concept that space and time are interrelated entities forming a single continuum and that the passage of time can be a variable that could change for each specific observer.

One of the many brilliant insights of special relativity was that it could explain why different observers can make validly different observations – it was a scientifically justifiable matter of perspective. 

As Einstein's Padawan Obi-Wan Kenobi would later explain in his remarkable 1983 “paper” on The Return of the Jedi:

“You're going to find that many of the truths we cling to depend greatly on our own point of view.”

Although the Special Theory of Relativity could explain the different perspectives of different observers, it could not explain the shared perspective of all observers.  Special relativity ignored a foundational force in classical physics – gravity.  So in 1916, Einstein used the force to incorporate a new perspective on gravity into what he called his General Theory of Relativity.


The Data-Information Continuum

In my popular post The Data-Information Continuum, I explained that data and information are also interrelated entities forming a single continuum.  I used the Dragnet definition for data – it is “just the facts” collected as an abstract description of the real-world entities that the enterprise does business with (e.g. customers, vendors, suppliers).

I explained that although a common definition for data quality is fitness for the purpose of use, the common challenge is that data has multiple uses – each with its own fitness requirements.  Viewing each intended use as the information that is derived from data, I defined information as data in use or data in action

I went on to the explain that data's quality must be objectively measured separate from its many uses and that information's quality can only be subjectively measured according to its specific use.


The Special Theory of Data Quality

The majority of data quality initiatives are reactive projects launched in the aftermath of an event when poor data quality negatively impacted decision-critical information. 

Many of these projects end in failure.  Some fail because of lofty expectations or unmanaged scope creep.  Most fail because they are based on the flawed perspective that data quality problems can be permanently “fixed” by a one-time project as opposed to needing a sustained program.

Whenever an organization approaches data quality as a one-time project and not as a sustained program, they are accepting what I refer to as the Special Theory of Data Quality.

However, similar to the accuracy of special relativity for solving a narrowly defined problem, sometimes applications of the Special Theory of Data Quality can yield successful results – from a certain point of view. 

Tactical initiatives will often have a necessarily narrow focus.  Reactive data quality projects are sometimes driven by a business triage for the most critical data problems requiring near-term prioritization that simply can't wait for the effects that would be caused by implementing a proactive strategic initiative (i.e. one that may have prevented the problems from happening).

One of the worst things that can happen to an organization is a successful data quality project – because it is almost always an implementation of information quality customized to the needs of the tactical initiative that provided its funding. 

Ultimately, this misperceived success simply delays an actual failure when one of the following happens:

  1. When the project is over, the team returns to their previous activities only to be forced into triage once again when the next inevitable crisis occurs where poor data quality negatively impacts decision-critical information.
  2. When either a new project (or later phase of the same project) attempts to enforce the information quality standards throughout the organization as if they were enterprise data quality standards.


The General Theory of Data Quality

True data quality standards are enterprise-wide standards providing an objective data foundation.  True information quality standards must always be customized to meet the subjective needs of a specific business process and/or initiative.

Both aspects of this shared perspective of quality must be incorporated into a single sustained program that enforces a consistent enterprise understanding of data, but that also provides the information necessary to support day-to-day operations.

Whenever an organization approaches data quality as a sustained program and not as a one-time project, they are accepting what I refer to as the General Theory of Data Quality.

Data governance provides the framework for crossing the special to general theoretical threshold necessary to evolve data quality from a project to a sustained program.  However, in this post, I want to remain focused on which theory an organization accepts because if you don't accept the General Theory of Data Quality, you likely also don't accept the crucial role that data governance plays in a data quality initiative – and in all fairness, data governance obviously involves much more than just data quality.


Theory vs. Practice

Even though I am an advocate for the General Theory of Data Quality, I also realize that no one works at a company called Perfect, Incorporated.  I would be lying if I said that I had not worked on more projects than programs, implemented more reactive data cleansing than proactive defect prevention, or that I have never championed a “single version of the truth.”

Therefore, my career has more often exemplified the Special Theory of Data Quality.  Or perhaps my career has exemplified what could be referred to as the General Practice of Data Quality?

What theory of data quality does your organization accept?  Which one do you personally accept? 

More importantly, what does your organization actually practice when it comes to data quality?


Related Posts

The Data-Information Continuum

Hyperactive Data Quality (Second Edition)

Hyperactive Data Quality (First Edition)

Data Governance and Data Quality

Schrödinger's Data Quality