Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments
« DQ-BE: Invitation to Duplication | Main | DQ-View: Roman Ruts on the Road to Data Governance »
Thursday
May052011

The Dichotomy Paradox, Data Quality and Zero Defects

As Joseph Mazur explains in Zeno’s Paradox, the ancient Greek philosopher Zeno constructed a series of logical paradoxes to prove that motion is impossible, which today remain on the cutting edge of our investigations into the fabric of space and time.

One of the paradoxes is known as the Dichotomy:

“A moving object will never reach any given point, because however near it may be, it must always first accomplish a halfway stage, and then the halfway stage of what is left and so on, and this series has no end.  Therefore, the object can never reach the end of any given distance.”

Of course, this paradox sounds silly.  After all, reaching a given point like the finish line in a race is reachable in real life since people win races all the time.  However, in theory, the mathematics is maddeningly sound, since it creates an infinite series of steps between the starting point and the finish line—and an infinite number of steps creates a journey that can never end.

Furthermore, this theoretical race cannot even begin, since in order to reach the first step, the recursive nature of this paradox proves that we would never reach the point of completing the first step.  Hence, the paradoxical conclusion is any travel over any finite distance can neither be completed nor begun, and so all motion must be an illusion.  Some of the greatest minds in history (from Galileo to Einstein to Stephen Hawking) have tackled the Dichotomy Paradox—but without being able to disprove it.

 

Data Quality and Zero Defects

The given point that many enterprise initiatives attempt to reach with data quality is 100% with a metric such as data accuracy.  Leaving aside (in this post) the fact that any data quality metric without a tangible business context provides no business value, 100% data quality (aka Zero Defects) is an unreachable destination—no matter how close you get or how long you try to reach it.

Zero Defects is a laudable goal—but its theory and practice comes from manufacturing quality.  However, I have always been of the opinion, unpopular among some of my peers, that manufacturing quality and data quality are very different disciplines, and although there is much to be learned from studying the theories of manufacturing quality, I believe that brute forcing those theories onto data quality is impractical and fundamentally flawed (and I’ve even said so in verse: To Our Data Perfectionists).

The given point that enterprise initiatives should actually be attempting to reach is data-driven solutions for business problems.

Advocates of Zero Defects argue that, in theory, defect-free data should be fit to serve as the basis for every possible business use, enabling a data-driven solution for any business problem.  However, in practice, business uses for data, as well as business itself, is always evolving.  Therefore, business problems are dynamic problems that do not have—nor do they require—perfect solutions.

Although the Dichotomy Paradox proves motion is theoretically impossible, our physical motion practically proves otherwise.  Has your data quality practice become motionless by trying to prove that Zero Defects is more than just theoretically possible?

 

Related Posts

The Role Of Data Quality Monitoring In Data Governance

The Asymptote of Data Quality

To Our Data Perfectionists

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

Data Quality and the Cupertino Effect

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Thaler’s Apples and Data Quality Oranges

Data In, Decision Out

The Data-Decision Symphony

Data Quality and The Middle Way

Missed It By That Much

The Data Quality Goldilocks Zone

You Can’t Always Get the Data You Want

How active is your data quality practice?

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (11)

Jim, may I beg for a clarification?

“Advocates of Zero Defects argue that, in theory, defect-free data should be fit to serve as the basis for every possible business use, enabling a data-driven solution for any business problem.”

Yes, I’m one of those advocates.

“...in practice, business uses for data, as well as business itself, is always evolving.”

I agree.

Why the “However ...” between those two statements?

Both theoretically and practically I know that complete and accurate (rather than “zero-defect”) data is good for any (business) use, now and in the future. In what way do you think that evolving uses would negate this?

Much obliged, as ever!

May 5, 2011 | Unregistered CommenterGraham Rhind

Thanks for your great comment, Graham. I will apologize in advance for the long reply. Best Regards, Jim.

(Please Note: most of this reply can be found in my article: The Role Of Data Quality Monitoring In Data Governance)

The assumption that complete and accurate data is good for any use, both now and in the future is based on the assumption that it is possible to define data quality independent of use.

Most commonly, this perspective is referred to as the real-world alignment definition of data quality.

Whether it’s an abstract description of real-world entities (i.e., “master data”) or an abstract description of real-world interactions (i.e., “transaction data”) among entities, data is an abstract description of reality.

The creation and maintenance of these abstract descriptions shapes the organization’s perception of the real world. However, these abstract descriptions can never be perfected because there is always a digital distance between data and the constantly changing real world that data attempts to describe.

The inconvenient truth is that the real world is not the same thing as the digital worlds captured within the organization’s databases. And, of course, creating and maintaining these digital worlds is no easy task, which is exactly the danger inherent with the real-world alignment definition of data quality—when the organization’s data quality efforts are focused on minimizing the digital distance between data and the real world, it can lead to a hyper-focus on the data in isolation, otherwise known as data myopia.

With a data-myopic focus, data quality can be misperceived as an activity performed for the sake of data. When, in fact, data quality is an activity performed for the sake of implementing data-driven solutions for business problems, enabling better business decisions, and driving optimal corporate performance.

Like truth, beauty, and art, data quality can be said to be in the eyes of the beholder, or when data quality is alternatively defined as fitness for the purpose of use—the eyes of the user. However, most data has both multiple uses and multiple users. Data of sufficient quality for one use or user may not be of sufficient quality for other uses and users. These multiple, and often conflicting, perspectives are considered irrelevant from the perspective of an individual user, who just needs quality data to support their own business activities.

The user perspective establishes a relative business context for data quality.

However, whereas the real-world alignment definition of data quality can cause a data-myopic focus, the business-world alignment goal of the fitness for the purpose of use definition of data quality must contend with the daunting challenge of business relativity—most data has multiple data consumers, each with their own relative business context for data quality, making it difficult to balance the diverse data needs and divergent data quality perspectives within the organization.

May 5, 2011 | Registered CommenterJim Harris

Thanks Jim. I am indeed of the school that defines data quality independently of its use (otherwise we would be speaking of “data fitness for purpose”, right? Our language is so rich ...).

And I don’t think it would be wise to get into that discussion all over again :-) And you are so right about digital distance - it’s a real cause of major data quality problems in every company I have ever come across.

But just a couple of thoughts:

(1) Do you think that every piece of data has a use? I don’t. And if it doesn’t, does that mean that that data cannot have a measure of quality?

(2) You seem to be taking real-world alignment as real-world alignment in a moment in time. I don’t. If my postal code as entered in your database was correct today, but changes tomorrow, then tomorrow that data loses its accuracy and its quality.

Maybe we should just dispense with the term “data quality” and think up something new to squabble about ... ;-)

May 5, 2011 | Unregistered CommenterGraham Rhind

Thanks for continuing to squabble with me, Graham :-)

As always, you raise two excellent points.

Regarding your first point, I do not think that every piece of data has a use (I have encountered, and probably created, quite a lot of useless data in my time). Whether useless data can have a measure of quality is a great question. Real-world alignment could be used (pun intended) to give the useless data a measure of quality (i.e., does the useless data accurately describe its corresponding real-world object).

However (by the way, I admit to perhaps being too fond of the word however), the quality of useless data is analogous to an inert virus. Neal Fishman has written that:

“Like a virus, data by itself is inert. Data requires software (or people) for the data to appear alive (or actionable) and cause a positive, neutral, or negative effect.”

Regarding your second point, yes I definitely view real-world alignment as real-world alignment in a moment in time. This is one of my biggest problems with the “Zero Defects Theory” since even when data can be said to be defect-free (or complete and accurate, if you prefer), that state is ephemeral. How can defect-free data be fit to serve as the basis for every possible business use without the constant vigilance of keeping it defect-free? (Even if such a state is attainable in the first place, which I highly doubt.) And the very nature of this necessary vigilance puts the focus on the data—and not on its uses in support of business activities. This effort often takes on a life of its own, where achieving complete and accurate (i.e., defect free) data is allowed to become the raison d'être of your data management strategy—in other words, you start managing data for the sake of managing data.

Maybe we really should dispense with the term “data quality” and find something new to squabble about ;-)

Best Regards,

Jim

May 5, 2011 | Registered CommenterJim Harris

Jim,

I think I probably agree with Neal Fishman about data, by itself, being inert. After all, the moment it is perceived (“becomes actionable” in buzz word speak) it becomes information, and that’s a whole new can of worms...

Isn’t there something to be said for a degree of data focus for the sake of data quality? Without that, too often data management becomes a reactive process which is pretty useless for any company.

For example, the company Top Dog requests a report on sales by postal code region. Because it wasn’t seen as having a business use when the system was set up, postal code wasn’t collected so the CEO has to make his/her decisions in another way - mostly by wetting a finger and sticking it in the air to test wind direction, if my experience is anything to go by. If that postal code data had been gathered (and maintained!!) in expectation of a future use, the report could have been produced.

Comes down to governance, I suppose, and it has to be sensible - cries that car mileage statistics need to be added to the veterinary database of sick animals is probably a little too proactive :-)

May 6, 2011 | Unregistered CommenterGraham Rhind

The most important question is: Exactly what is a defect?

Unfortunately, you can’t assume the task of defining the exact definition of a defect will be easy.

May 6, 2011 | Unregistered CommenterSteve Sarsfield

@Graham — Yes, let’s stay out of the Data-Information Worm Can :-) . . . I agree that a degree of data focus for the sake of data quality is necessary, but we must make sure that data has tangible alignment with business goals. Of course, as you said, future business goals are difficult to predict. I think that this leaves most data management professionals adopting the default position of managing any and all data that somehow finds its way into the organization. As Julian Schwarzenbach recently blogged, it’s essential to regularly assess whether you are managing and improving the right data.

@Steve — Excellent point, Steve. I have worked with lots of data that was allegedly defect-free, only to be surprised by what appeared to be obvious data quality issues. The root cause of the problem, as you suggest, is often one of metadata. We can’t take a data quality metric (e.g., Accuracy) at face value without understanding how the metric is defined and measured.

May 6, 2011 | Registered CommenterJim Harris

From the LinkedIn Group for Data Governance & Data Quality, Milan Vacval commented:

“I think it’s more important to establish sustainable processes for quality maintenance than attempt to achieve zero defects initially. Especially for product information (which is evolving and changing), the process of verification against the actual physical product is the only way to assure quality. It doesn’t mean that you neglect the processes to set initial data values.”

And I responded:

Thanks for your comment, Milan.

Excellent point about establishing sustainable processes being more important than achieving zero defects.

Best Regards,

Jim

May 6, 2011 | Registered CommenterJim Harris

Check out the great comments that this blog post received from its syndication on Information Management:

The Dichotomy Paradox, Data Quality and Zero Defects on Information Management

May 29, 2011 | Registered CommenterJim Harris

From the LinkedIn Group for Data Governance & Data Quality, Gerard ONeill commented:

“Interesting post! First, I would caution maintaining the difference between 'inductive' proof and 'deductive' proof. It is mentioned when one of the philosophers offers the simple refutation of both moving and crossing the finish line.

With respect to data perfection, Zero Defects is possible since we are talking about statistics.

Zero Defects is not an 'impossibility'. But as with all business, the question always arises 'At what cost?'. One example, if you cull the data to eliminate imperfections, you may be throwing out some good data with the bad data. For example by only adding people to a mailing list that have had the email account longer than 30 days, you are eliminating new email subscribers, or old ones who are savvy about setting up a new account for each mailing list.

A business process management (BPM) approach can help you determine those costs, including how sustainable they are and how far away you are from realizing the balance you want.”


And David Ho commented:

“To Milan's point, sustainability is (and should be) a strategic key in line with Continuous Improvement and Lean Thinking concepts. Zero Defects could be a reality in the Six Sigma quality domain where the threshold of error is defined and measure and evaluated by statistical methods. And yes, as Gerard pointed out above, at what cost and materiality? Otherwise, we might be trapped in quality for quality's sake.

In my opinion, Zero Defects is indeed a nice strategic goal to aim for. But in reality, we will always be measuring on how close we could come to attaining it. For instance, it's a balance between (say) maintaining the defect levels at 3% and incurring a minimal impact to the overall cost (bottom-line) to achieve it. Would we lose customers at 3% defects?”

July 3, 2011 | Registered CommenterJim Harris

From the LinkedIn Group for Data Governance & Data Quality, Jason Koulouras commented:

“Out of curiosity is there a specific point or points in the supply chain and lifeycle of the data where defects would be measured? For example, it would be a very tall order to believe that zero defects would occur at point of acquisition of raw content/data supplied from a traditional supply chain process, but one could imagine (albeit with caveats such as costs, value, who defines zero defects, what is the actual 'truth' of the data etc.) some later stage where after validation and checking etc., there is a zero defect stage that may be attainable.”

And I responded:

There often are different data quality thresholds for different points in the data lifecycle, including points where zero defects might be both necessary and attainable.

July 5, 2011 | Registered CommenterJim Harris

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>