Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments
« The Good Data | Main | Can Data Quality avoid the Dustbin of History? »
Tuesday
Dec142010

A Confederacy of Data Defects

One of my favorite novels is A Confederacy of Dunces by John Kennedy Toole.  The novel tells the tragicomic tale of Ignatius J. Reilly, described in the foreword by Walker Percy as a “slob extraordinary, a mad Oliver Hardy, a fat Don Quixote, and a perverse Thomas Aquinas rolled into one.”

The novel was written in the 1960s before the age of computer filing systems, so one of the jobs Ignatius has is working as a paper filing clerk in a clothing factory.  His employer is initially impressed with his job performance, since the disorderly mess of invoices and other paperwork slowly begin to disappear, resulting in the orderly appearance of a well organized and efficiently managed office space.

However, Ignatius is fired after he reveals the secret to his filing system—instead of filing the paperwork away into the appropriate file cabinets, he has simply been throwing all of the paperwork into the trash.

This scene reminds me of how data quality issues (aka data defects) are often perceived.  Many organizations acknowledge the importance of data quality, but don’t believe that data defects occur very often because the data made available to end users in dashboards and reports often passes through many processes that cleanse or otherwise sanitize the data before it reaches them.

ETL processes that extract source data for a data warehouse load will often perform basic data quality checks.  However, a fairly standard practice for “resolving” a data defect is to substitute a NULL value (e.g., a date stored in a text field in a source system that can not be converted into a valid date value is usually loaded into the target relational database with a NULL value).

When postal address validation software generates a valid mailing address, it often does so by removing what it considers to be “extraneous” information from the input address fields, which may include valid data accidentally entered into the wrong field, or that was lacking its own input field (e.g., e-mail address in an input address field deleted from the output valid mailing address).

And some reporting processes intentionally filter out “bad records” or eliminate “outlier values.”  This happens most frequently when preparing highly summarized reports, especially those intended for executive management.

These are just a few examples of common practices that can create the orderly appearance of a high quality data environment, but that conceal a confederacy of data defects about which the organization may remain blissfully (and dangerously) ignorant.

Do you suspect that your organization may be concealing A Confederacy of Data Defects?

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (3)

From the LinkedIn Group for DAMA International, Jeff Klagenberg commented:

“That is an attitude I have seen as well. When preparing for an MDM deployment customers are often surprised at how bad their data quality is. Data cleansing tools are great but it is vital to have in place a mechanism to continually monitor data quality. This monitoring can drive data cleansing rules, adapting to the reality of data.”

And I responded:

I definitely agree that continually monitoring data quality is essential.

And Justin Lane commented:

“In my experience the processes that cleanse or otherwise sanitize the data are often quite 'manual' - there are department(s) of 'people' who take care of this often via processes which are somewhat 'ad-hoc' and undocumented. It's also not uncommon to find few (if any) metrics to measure data quality and track changes in data quality over time.”

And I responded:

Another problem I have occasionally encountered is when metrics were in place to measure and track changes in data quality over time, they were pointed at a spot in the process flow that was downstream from the processes that "prepare" the data for use, i.e., after the manual or automated scrubbing has been performed. Organizations need to continually monitor data quality at the sources, and before any of the "magic" happens :-)

And Tom Pantano commented:

“Great story and metaphor. Reminded of an inventory report I saw early on in my career where 3/4 of the records were simply discarded owing to data errors. Remarkably nobody noticed.

In many cases I see this as a symptom of a strategic gap. If the business is unaware of the cleansing that takes place it suggests that they aren't a part of the data quality process. If the business isn't engaged it's reasonable that they'd be unaware of the data quality efforts that occur on their behalf.

Unless the business is a partner and co-driver of data quality - actually data management - it will be difficult to make appreciable headway on data quality or any other material aspect of enterprise data management.”

December 14, 2010 | Registered CommenterJim Harris

From the LinkedIn Group for the Data Governance Professionals Organization, Dan Dechichio commented:

“That is the situation at my company as well. IT does such a good job of "fixing" data that the business sometimes does not even realize they have bad data.”

And I responded:

It's a challenge for many IT organizations, that can essentially become viewed as "unsung heroes" (using a positive spin) or "covert concealers" (using a negative spin) when the reality of poor data quality comes to light.

And Dan Dechichio responded:

“It is a catch 22 with some IT individuals. You want to make things right but people keep firefighting and are not given the time or the resources to fix the root problem.

A shift in mindset is needed.”

And Dimpsy Teckchandani responded:

“Data governance is more about people and processes than about IT. However, more often than not in our experience, I have seen that while business is not entirely happy with the state of affairs (i.e., defects fixed only and not resolved), it's an unwilling community to hash out key challenges that stunt the success of a governance program i.e., ownership and accountability. Siloed ownership or "to each his own" cannot be the resolution. So IT sometimes can be the facilitator that not only "fixes" data but highlights "facts and themes" as to why/ how these fixes can be converted into "permanent solutions" instead. A whole mechanism around it. Organizations that understand this are the ones that are successful...takes a whole path to get there, but worth the investment!”

And Mani Kumar Manda responded:

“I have seen at many clients where business folks and to certain extent even IT folks are not aware of the data quality problems, because none of these clients have either a Data Governance program or at a simplistic level a data quality program that reports the state of data via metrics.

One of the underlying reasons I have noticed is that these clients use off the shelf packages and both Business and IT does not necessarily understand the full power/flexibility of these programs and many ways a specific task can be done using these applications. They often believe what they do or see is what can happen in the package and it is often right. Because it is a packaged application, the data has got to be right, a perception leading into ignorance of data quality. There lies a cause of potential data quality problem.”

And Dimpsy Teckchandani responded:

“Totally agree. The notion is that it is something that a tool or technology can resolve which is nothing but a misconception - a myopic view of a solution. Technology should only be looked as an enabler. It is also a common misunderstanding that data once fixed in the source system will resolve all potential quality problems. Which brings me back to my original comment - this is more a people and process issue. The number of hands on the keyboard drive the process and therein lies the problem. A holistic approach is necessary to resolve this problem...there is no magic bullet...it's an evolution achieved through a well defined maturity path.”

December 15, 2010 | Registered CommenterJim Harris

It is good practice to cleanse data "pro-actively" before the business gets sense of bad data.

But not every IT shop keeps track of what data was cleansed at the record level to report to the business to validate the cleansing using reports and/or data quality dashboards.

December 16, 2010 | Unregistered CommenterSree Boyapati

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>