Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments
« Big Data Lessons from Orbitz | Main | The Return of the Dumb Terminal »
Monday
Jun252012

Metadata, Data Quality, and the Stroop Test

In psychology, the Stroop Effect is a demonstration of the reaction time of a task.  The most commonly used example is what is known as the Stroop Test, which compares the time needed to name colors when they are printed in an ink color that matches their name (e.g., greenyellowredbluebrownpurple) with the time needed to name the same colors when they are printed in an ink color that does not match their name (e.g., bluered, purple, green, brownyellow).  Naming the color of the word takes longer, and is more prone to errors, when the ink color does not match the name of the color.

The Stroop Test, where colors do not match their names, reminds me of the relationship between metadata and data quality if I view the ink color as the metadata and the name of the color as the data, given that understanding data takes longer, and is more prone to errors, when the metadata does not match the data, or when the metadata is ambiguous.

Unlike the Stroop Test, where poor metadata (ink color) obfuscates good data (name of the color), data quality issues can also be caused when good metadata is undermined by poor data (e.g., data entry errors like an email address being entered into a postal address field).  And, of course, even when the entered data matches the metadata (or automatic data-to-metadata matching is enabled by drop-down boxes), more insidious data quality issues can be caused by the complex challenge of data accuracy.

Additionally, the point of view paradox can turn data quality debates about fitness for the purpose of use even more colorful than the Stroop Test, such as when data that one user sees as red and green, another user sees as crimson and chartreuse.

But hopefully we can all agree that good data quality begins with good metadata, because better metadata makes data better.

 

Related Posts

You Say Potato and I Say Tater Tot

The Metadata Continuum

The Metadata Crisis

Let’s Meta a Data

What’s the Meta with your Data?

DQ-View: MetaData makes BettahMusic

Who Framed Data Entry?

Data Quality and the Cupertino Effect

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

DQ-BE: Data Quality Airlines

Data Quality and the Q Test

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (3)

From the LinkedIn Group for DAMA International, Thijs van der Feltz commented:

“As always, a great commentary. Thanks Jim, for helping us convince others that metadata is not dull and theoretical, but fun and challenging, and most importantly, that it's a vital prerequisite to data quality.”

June 26, 2012 | Registered CommenterJim Harris

Great analogy!

The longer it takes to process and understand the data, the greater the chance for error.

Why make it any harder than need be?

June 26, 2012 | Unregistered CommenterNancy Beckman

Interesting analogy. It would be interesting to carry it even further.

For example, according to the Dimensional Overlap Model, the Stroop effect is caused by the interference that happens when irrelevant information is similar to the relevant information but points to the wrong response.

In this case, metadata that is related to the data in some way, but incorrect, would lead to more problems than metadata that was completely unrelated.

July 5, 2012 | Unregistered CommenterGreg Stevens

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>