Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments
« Enterprise Data World 2009 | Main | Data Quality Whitepapers are Worthless »
Sunday
Apr052009

There are no Magic Beans for Data Quality

The CIO put Jack in charge of an enterprise initiative with a sizable budget to spend on improving data quality.

Jack was sent to a leading industry conference to evaluate data quality vendors.  While his flight was delayed, Jack was passing the time in the airport bar when he was approached by Machiavelli, a salesperson from a data quality software company called Magic Beans.

Machiavelli told Jack that he didn't need to go to the conference to evaluate vendors.  Instead, Jack could simply trade his entire budget for a unlimited license of Magic Beans. 

Machiavelli assured Jack that Magic Beans had the following features:

  • Simple to install
  • Remarkably intuitive user interface
  • Processes a gazillion records per nanosecond
  • Clairvoyantly detects and corrects existing data quality problems
  • Prevents all future data quality problems from happening

Jack agreed to the trade and went back to the office with Magic Beans.

Eighteen months later, Jack and the CIO carpooled to Washington, D.C. to ask Congress for a sizable bailout.

What is the moral of this story? 

(Other than never trust a salesperson named Machiavelli.)

There are many data quality vendors to choose from and all of them offer viable solutions driven by impressive technology.

However, technology sometimes carries with it a dangerous conceit – that what works in the laboratory and the engineering department will work in the boardroom and the accounting department, that what is true for the mathematician and the computer scientist will be true for the business analyst and the data steward.

My point is neither to discourage the purchase of data quality software, nor to try to convince you which vendor I think provides the superior solution – especially since these types of opinions are usually biased by the practical limits of your personal experience and motivated by the kind folks who are currently paying your salary.

And I am certainly not a Luddite opposed to the use of technology.  I am first, foremost, and proudly a techno-geek of the highest order.  However, I have seen too many projects fail when a solution to data quality problems was attempted by “throwing technology at it.”  I have seen beautifully architected, wonderfully coded, elegantly implemented technical solutions result in complete and utter failure.  These projects failed neither because using technology was the wrong approach nor because the wrong data quality software was selected.

Data quality solutions require a holistic approach involving people, methodology, and technology.

People

Sometimes, people doubt that data quality problems could be prevalent in their systems.  This “data denial” is not necessarily a matter of blissful ignorance, but is often a natural self-defense mechanism from the data owners on the business side and/or the process owners on the technical side.  No one likes to feel blamed for causing or failing to fix the data quality problems.  This is one of the many human dynamics that is missing from the relative clean room of the laboratory where the technology was developed.  You must consider the human factor because it will be the people involved in the project, and not the technology itself, that will truly make the project successful.

 

Methodology

Data characteristics and their associated quality challenges are unique from company to company.  Data quality can be defined differently by different functional areas within the same company.  Business rules can change from project to project.  Decision makers on the same project can have widely varying perspectives.  All of this points to the need for having an effective methodology, which will help you maximize the time and effort as well as the subsequent return on whatever technology you invest in.

 

Technology

I have used software from most of the Gartner Data Quality Magic Quadrant and many of the so-called niche vendors.  So I speak from experience when I say that all data quality vendors have viable solutions driven by impressive technology.  However, don't let the salesperson “blind you with science” to have unrealistic expectations of the software.  I am not trying to accuse all salespeople of Machiavellian machinations (even though we have all encountered a few who would shamelessly sell their mother’s soul to meet their quota).     

 

Conclusion

Just like any complex problem, there is no fast and easy solution.  Although incredible advancements in technology continue, there are no Magic Beans for Data Quality.

And there never will be.

An organization's data quality initiative can only be successful when people take on the challenge united by collaboration, guided by an effective methodology, and of course, implemented with amazing technology.

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (8)

Jim,

You've nailed it.

I have worked for data quality vendors, sold for them, designed the features to improve the solutions and managed multi language data clean up projects. The reality is that you have to pull your sleeves up and manage an intensive process that can, if you are lucky, be made possible with good technology.

The hard part is convincing people that the technology is an aid to a tough process that requires dedication and resource and not the silver bullet.

Nice work.

April 9, 2009 | Unregistered CommenterPaul Mayer

I wholeheartedly agree with Jim on his posting. I would add one point that may have been inferred, but not directly stated. A data quality solution consisting of people, methodology (or process) and technology should be clearly focused on solving a well defined and understood business problem. Our experience has shown that many dq implementations are begun without the slightest notion of the business problem being solved. In this environment success is difficult if not almost impossible to quantify.

April 10, 2009 | Unregistered CommenterLeonard Dubois

I appreciate the methodology first mention. In a data-use parallel, I encounter similar circumstances with misuse of data and supposed favorable application of technology designing PDA survey forms and training NGO staff to use them. It all starts with the data model, and that being absent eliciting the structure and logic of the ad hoc survey instrument for the users/designers. I frequently tell the team, "what technologically works here in the air-conditioned training location will seldom provide viable data once we take it to the field location."

April 10, 2009 | Unregistered CommenterDavid Isaak

Paul, Leonard and David:

Thanks for the comments – I greatly appreciate the feedback and enjoyed reading your perspectives.

Best Regards…

Jim

April 12, 2009 | Registered CommenterJim Harris

From the LinkedIn Group for Enterprise Data Quality , Mike Pratt commented:

"The only thing I'd like to add is about the perception of Data Quality from a corporate perspective.

I've worked in BI and data quality for over 12+ years and have seen firsthand the claims that sales people make about what their product will do. Too often the CEO or CIO are impressed by a particular product and see it as the solution to their data quality problems. They perceive it, at best, to be a task performed by techno-geeks in a data quality department or, at worst, a one off project.

I believe there is a need within any company for a data quality department who perform this role but also the concept of data quality should be embedded culturally within an organization.

This means that companies need to understand the competitive edge that data quality brings and to make it a corporate objective. This then becomes an objective for all the people in the organization so that everybody has a responsibility for it.

Until everybody in the organization understands the impact data quality can have, both negative and positive, data quality will always be the domain of the few who are acting in a re-active manner."

April 13, 2009 | Registered CommenterJim Harris

Having been in the PRODUCT data quality business for 10 years, our experience is that, like many who have commented, the best solution is a combination of people, process and technology. The technology should not be the driver, but be driven by the people and support their processes.

We have helped one particular retailer in the UK nail it:

In 2007 they have no data governance or any kind of ownership of the data and any error correction was manual and addressed locally where the pain was felt.

Now there is a data governance strategy in place and they have a small team of data analysts who own the quality and integrity of the data across the business. They constantly measure this against a set of defined Key Performance Indicators (KPI) that are used to target poor performing areas of both their business and Trading Partner community, which they engage through an educational programme and have defined processes to manage the introduction and changes to new items in their business.

The technology underpins their workflow, with structured approvals and audits and the business rules engine helps them identify and address errors by expection.

With high product churn and ever changing attribute requirements to meet regulatory requirements and consumer demand across their multi- channel business, this combination of people, process and technology, in less than 2 years, has enabled them to reduce their error count from 82% to less than 38%, but clearly they still have a long way to go yet...and maybe they will never reach their goal of zero errors, but I am sure they will get close to it.

I am in sales, but learnt many years ago, that if you sell somebody something that doesn't meet their requirements or is not sustainable, then it will be more trouble to you personally in the long run. My advice to any salesman is, if there isn't a fit, be honest, walk away and maintain your credibility!

April 15, 2009 | Unregistered CommenterSteve Richardson

Steve,

Thanks for providing feedback and an excellent data quality success story.

I also appreciated the reassurance that not all salespeople are Machiavellian by nature.

Best Regards…

Jim

April 15, 2009 | Registered CommenterJim Harris

From the LinkedIn Group for Enterprise Data Quality, Mark Evans commented:

"Too often, vendors and IT professionals are trained to focus on the integrity of the database or system at hand more than the larger view of data usage in a complex enterprise. Case in point: I met a “revenue recovery” vendor a couple of years ago (really a data quality vendor) who finds millions of dollars of unbilled services for telecommunications companies by checking relational integrity, historical data timelines, attribute relationships, etc. and finds lines and services that are active in operations but not appropriately recognized across systems and databases. Arkady Maydanchik does a very nice job of dealing with these types data issues in his book Data Quality Assessment.

Vendor tools can provide accurate data but misleading information. For example. if an address number is transposed from say “112 Oakhill” to “121 Oakhill”, address hygiene software may flag the transposed address as correct - for a residence may indeed be located at “121 Oakhill”. The address data may be accurate, clean, complete, and in one sense current – but wrong for its intended use. Very few vendors can provide knowledge of the currency of the full occupancy that was just created in their tool set.

Finally, too often we find “vendor SME” definitions of data quality that are based more on data integrity than data usage. For example, some vendor’s tools will provide a deliverability score for a piece of mail based on the completeness of the address and tell you what pieces to leave out of your mail stream (with little correlation between scores and return mail rates). But mail is not delivered by computers and returned mail has a half-life. If you send your returned mail back to the USPS with no data changes, 40% to 50% of it will be delivered on the second try, then again on the third, and so on. There are no magic beans for data quality, but sometimes a little fairy dust helps. :)

Data quality is most commonly defined as “fitness for its intended use” (paraphrased). But the devil is indeed in the details. To deliver meaningful data quality, our understanding of “intended use”, often needs to be broadened beyond the standards by which many systems and tools are constructed. In the end, the development of the data quality specialist as a profession should be a good thing."

April 15, 2009 | Registered CommenterJim Harris

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>