Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments
« Data Governance and Data Quality | Main | Worthy Data Quality Whitepapers (Part 1) »
Wednesday
Jul012009

Missed It By That Much

In the mission to gain control over data chaos, a project is launched in order to implement a new system to help remediate the poor data quality that is negatively impacting decision-critical enterprise information. 

The project appears to be well planned.  Business requirements were well documented.  A data quality assessment was performed to gain an understanding of the data challenges that would be faced during development and testing.  Detailed architectural and functional specifications were written to guide these efforts.

The project appears to be progressing well.  Business, technical and data issues all come up from time to time.  Meetings are held to prioritize the issues and determine their impact.  Some issues require immediate fixes, while other issues are deferred to the next phase of the project.  All of these decisions are documented and well communicated to the end-user community.

Expectations appear to have been properly set for end-user acceptance testing.

As a best practice, the new system was designed to identify and report exceptions when they occur.  The end-users agreed that an obsessive-compulsive quest to find and fix every data quality problem is a laudable pursuit but ultimately a self-defeating cause.  Data quality problems can be very insidious and even the best data remediation process will still produce exceptions.

Although all of this is easy to accept in theory, it is notoriously difficult to accept in practice.

Once the end-users start reviewing the exceptions, their confidence in the new system drops rapidly.  Even after some enhancements increase the number of records without an exception from 86% to 99% – the end-users continue to focus on the remaining 1% of the records that are still producing data quality exceptions.

Would you believe this incredibly common scenario can prevent acceptance of an overwhelmingly successful implementation?

How about if I quoted one of the many people who can help you get smarter than by only listening to me?

In his excellent book Why New Systems Fail: Theory and Practice Collide, Phil Simon explains:

“Systems are to  be appreciated by their general effects, and not by particular exceptions...

Errors are actually helpful the vast majority of the time.”

In fact, because the new system was designed to identify and report errors when they occur:

“End-users could focus on the root causes of the problem and not have to wade through hundreds of thousands of records in an attempt to find the problem records.”

I have seen projects fail in the many ways described by detailed case studies in Phil Simon's fantastic book.   However, one of the most common and frustrating data quality failures is the project that was so close to being a success but the focus on exceptions resulted in the end-users telling us that we “missed it by that much.”

I am neither suggesting that end-users are unrealistic nor that exceptions should be ignored. 

Reducing exceptions (i.e. poor data quality) is the whole point of the project and nobody understands the data better than the end-users.  However, chasing perfection can undermine the best intentions. 

In order to be successful, data quality projects must always be understood as an iterative process.  Small incremental improvements will build momentum to larger success over time. 

Instead of focusing on the exceptions – focus on the improvements. 

And you will begin making steady progress toward improving your data quality.

And loving it!

 

Related Posts

The Data Quality Goldilocks Zone

Schrödinger's Data Quality

The Nine Circles of Data Quality Hell

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (4)

On a project in my former day job we addressed this in a pragmatic way...

1) We accepted that perfection was to be aspired to but that 85% of the way there was still a "top of class" grade.

2) We EXPLICITLY modeled for failure in the data model...we included fields for the data that could not be cleaned. The fields were EXPLICITLY called "CRUD [enter entity here] Field". It gave us somewhere to put the crap when we couldn't clean it.

3) We defined processes to have those fields reviewed by human agents when they were engaged with the customers on the phone or when they encountered the duf data as part of a normal BAU process. The message was simple...we're telling you it is crap that can't be cleaned by machine...spend 20 seconds helping us out.

We "missed it by that much" because our IT team spent the data migration and database physicalisation budget on front-end systems in a separate phase of the overall program.

July 1, 2009 | Unregistered CommenterDaragh O Brien

I was trained on Six Sigma and although I (and our company) still uses many of its tools, this illustrates why it can be dangerous to adopt the mindset that "Six Sigma" should be the ultimate goal.

I will keep this handy for the next IT system that we can't quite finish.

July 2, 2009 | Unregistered CommenterJeremy Benson

Over on the SmartData Collective, Daniel Gent commented:

"Well said Jim. I'm working on a new database now and implementing many data quality checks. But it's those errors we get afterward that will be of the most interest to us. This post exemplifies this perfectly."

July 2, 2009 | Registered CommenterJim Harris

Very well articulated blog post. My favorite quote is "Instead of focusing on the exceptions – focus on the improvements."

I think that it is really important to define incremental goals for data quality projects and track the progress through percentage improvement over a period of time.

I think it is also important to manage the expectations that goal is not necessarily to reach 100% (which will be extremely difficult if not impossible) clean data but the goal is to make progress to a point where the purpose for cleaning the data can be achieved in much better way than had the original data be used.

For example, if marketing wanted to use the contact data to create a campaign for those contacts which have certain ERP system installed on-site. But if the ERP information on the contact database is not clean (it is free text, in some cases it is absent etc...) then any campaign run on this data will reach only x% contacts at best (assuming only x% of contacts have ERP which is clean)...if the data quality project is undertaken to clean this data, one needs to look at progress in terms of % improvement. How many contacts now have their ERP field cleaned and legible compared to when we started etc...and a reasonable goal needs to be set based on how much marketing and IT is willing to invest in this issues (which in turn could be based on ROI of campaign based on increased outreach).

July 6, 2009 | Unregistered CommenterVishAgashe

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>