The Data Quality Goldilocks Zone
In astronomy, the habitable region of space where stellar conditions are favorable for life as it is found on Earth is referred to as the "Goldilocks Zone" because such a region of space is neither too close to the sun (making it too hot) nor too far away from the sun (making it too cold), but is "just right."
In data quality, there is also a Goldilocks Zone, which is the habitable region of time when project conditions are favorable for success.
Too many projects fail because of lofty expectations, unmanaged scope creep, and the unrealistic perspective that data quality problems can be permanently “fixed” as opposed to needing eternal vigilance. In order to be successful, projects must always be understood as an iterative process. Return on investment (ROI) will be achieved by targeting well defined objectives that can deliver small incremental returns that will build momentum to larger success over time.
Data quality projects are easy to get started, even easier to end in failure, and often lack the decency of at least failing quickly. Just like any complex problem, there is no fast and easy solution for data quality.
Projects are launched to understand and remediate the poor data quality that is negatively impacting decision critical enterprise information. Data-driven problems require data-driven solutions. At that point in the project lifecycle when the team must decide if the efforts of the current iteration are ready for implementation, they are dealing with the Data Quality Goldilocks Zone, which instead of being measured by proximity to the sun, is measured by proximity to full data remediation, otherwise known as perfection.
The obvious problem is that perfection is impossible. An obsessive-compulsive quest to find and fix every data quality problem is a laudable pursuit but ultimately a self-defeating cause. Data quality problems can be very insidious and even the best data remediation process will still produce exceptions. As a best practice, your process should be designed to identify and report exceptions when they occur. In fact, many implementations will include logic to provide the ability to suspend exceptions for manual review and correction.
Although all of this is easy to accept in theory, it is notoriously difficult to accept in practice.
For example, let’s imagine that your project is processing one billion records and that exhaustive analysis has determined that the results are correct 99.99999% of the time, meaning that exceptions occur in only 0.00001% of the total data population. Now, imagine explaining these statistics to the project team, but providing only the 100 exception records for review. Do not underestimate the difficulty that the human mind has with large numbers (i.e. 100 is an easy number to relate to but one billion is practically incomprehensible). Also, don’t ignore the effect known as “negativity bias” where bad evokes a stronger reaction than good in the human mind - just compare an insult and a compliment, which one do you remember more often? Focusing on the exceptions can undermine confidence and prevent acceptance of an overwhelmingly successful implementation.
If you can accept there will be exceptions, admit perfection is impossible, implement data quality improvements in iterations, and acknowledge when the current iteration has reached the Data Quality Goldilocks Zone, then your data quality initiative will not be perfect, but it will be "just right."



Jim Harris
Reader Comments (3)
Jim
Nice post.
To take (or stretch) your analogy a little further, it is also important to remember that quality is ultimately defined by the consumers of the information. For example, if you were working on a customer data set (or 'porridge' in Goldilocks terms) you might get it to a point where Marketing think it is "just right" but your Compliance and Risk management people might think it is too hot and your Field Sales people might think it is too cold. Declaring "Mission Accomplished" when you have addressed the needs of just one stakeholder in the information can often be premature.
Also, one of the key learnings that we've captured in the IAIDQ over the past 5 years from meeting with practitioners and hosting our webinars is that, just like any Change Management effort, information quality change requires you to break the challenge into smaller deliverables so that you get regular delivery of "just right" porridge to the various stakeholders rather than boiling the whole thing up together and leaving everyone with a bad taste in their mouths. It also means you can more quickly see when you've reached the Goldilocks zone.
Daragh,
Excellent points! Too many data quality initiatives suffer from taking a “one size fits all” approach, assuming that quality will mean the same to all of the stakeholders and information consumers.
Thanks for your comment, especially for staying with the theme and expanding on the analogy.
Perhaps we can launch an IAIDQ series of information quality articles based on other children’s nursery rhymes? You know, start the education early so that the next generation can make information quality issues a thing of the past.
Best Regards…
Jim
From the LinkedIn Group for the Datarati, Chad Cook commented:
“Great article Jim. In my arena it’s also about how to make the client realize the perfection golden ring is never achievable, but if you can stand the inconsistencies, it is desirable to strive towards perfection.”