Got Data Quality?
I have written many blog posts about how it’s neither a realistic nor a required data management goal to achieve data perfection, i.e., 100% data quality or zero defects.
Of course, this admonition logically invites the questions:
If achieving 100% data quality isn’t the goal, then what is?
99%?
98%?
As I was pondering these questions while grocery shopping, I walked down the dairy aisle casually perusing the wide variety of milk options, when the thought occurred to me that data quality issues have a lot in common with the fat content of milk.
The classification of the percentage of fat (more specifically butterfat) in milk varies slightly by country. In the United States, whole milk is approximately 3.25% fat, whereas reduced fat milk is 2% fat, low fat milk is 1% fat, and skim milk is 0.5% fat.
Reducing the total amount of fat (especially saturated and trans fat) is a common recommendation for a healthy diet. Likewise, reducing the total amount of defects (i.e., data quality issues) is a common recommendation for a healthy data management strategy. However, just like it would be unhealthy to remove all of the fat from your diet (because some fatty acids are essential nutrients that can’t be derived from other sources), it would be unhealthy to attempt to remove all of the defects from your data.
So maybe your organization is currently drinking whole data (i.e., 3.25% defects or 96.75% data quality) and needs to consider switching to reduced defect data (i.e., 2% defects or 98% data quality), low defect data (i.e., 1% defects or 99% data quality), or possibly even skim data (i.e., 0.5% defects or 99.5% data quality).
No matter what your perspective is regarding the appropriate data quality goal for your organization, at the very least, I think that we can all agree that all of our enterprise data management initiatives have to ask the question: “Got Quality?”
Related Posts
The Dichotomy Paradox, Data Quality and Zero Defects
The Real Data Value is Business Insight
Is your data complete and accurate, but useless to your business?
Thaler’s Apples and Data Quality Oranges
Data Quality and The Middle Way
The Data Quality Goldilocks Zone
You Can’t Always Get the Data You Want



Jim Harris
Reader Comments (2)
It always makes me smile when people attempt to put a percentage value on their data quality as though it were something as tangible and measurable as the fat content of your milk.
In order to make such a measurement one would need to know where 100% of the defects lie. If they knew that they would be able to resolve the defects and achieve 100% quality.
In reality you cannot and do not know where each defect is and how many there are. Even though tools such as profilers will tell you, for example, that 95% of your US address records have a valid state added, there is still no way to measure how many of these valid states are applicable to the real world entity on the ground. Mr Smith may be registered in the database to an existing and valid address in the database, but if he moved last week there's a data quality issue that won't be discovered until one attempts to contact him.
The same applies when people say they have removed 95% of duplicates from their data. If they can measure it then they know where the other 5% of duplicates are and they can remove them.
But back to the point: you may not achieve 100% quality. In fact, we know you never will.
But aiming for that target means that you're aiming in the right direction. As long as your goal is to get close to perfection and not to achieve it, I don't see the problem.
Check out the great comments that this blog post received from its syndication on Information Management:
Got Data Quality? on Information Management