OCDQ Blog
  • Home
  • Blog
  • Podcast
  • Best of OCDQ
  • Published Articles
  • OCDQ Jim Harris Testimonials
  • Contact
  • RSS

OCDQ Blog

  • Home/
  • Blog/
  • Podcast/
  • Best of OCDQ/
  • Published Articles/
  • About/
    • OCDQ
    • Jim Harris
    • Testimonials
  • Contact/
  • RSS/
iStock_000032135210Large.png

OCDQ Blog

Obsessive-Compulsive Data Quality by Jim Harris

OCDQ Blog

Obsessive-Compulsive Data Quality by Jim Harris

OCDQ Blog

  • Home/
  • Blog/
  • Podcast/
  • Best of OCDQ/
  • Published Articles/
  • About/
    • OCDQ
    • Jim Harris
    • Testimonials
  • Contact/
  • RSS/
September 05, 2018

What an Old Dictionary teaches us about Metadata

September 05, 2018/ Jim Harris
Old+Dictionary.jpg

Spelling, pronunciation, and examples of usage are included in the dictionary definition of a word, which is a good example of one of the many uses of metadata, namely to provide a definition, description, and context for data.

Pictured to the left is the dictionary that has been on my desk for over 15 years, which is a good metaphor for the challenges of metadata management.

When I first bought the dictionary, it was, as its front cover attested, “The Newest.  The Best.  A Trusted Authority.  A brand-new dictionary of the 1990s, for the 1990s.  Comprehensive coverage of current words and terms, with clear, understandable definitions and up-to-the-minute usage guidance.”

And its back cover boasted of “60,000 entries assembled by a state-of-the-art authority using the most modern sources of information, and prepared by lexicographic experts to provide the one-stop reference book to turn to for all of your word questions.”  (However, if one of your word questions was about metadata you were out of luck because it didn’t have an entry for it.)

The multidimensionality of metadata is exemplified by how a dictionary rarely contains a single definition for a word, and an old dictionary exemplifies how constantly changing semantics further complicate metadata management.

Using an old dictionary has several downsides, such as new words would not be in it, and some existing words would have either new definitions or an updated definition order based on the predominant context of current usage.

Organizations face a similar challenge while trying to maintain a metadata dictionary containing comprehensive coverage of business and technical terminology.  Hopefully providing clear, understandable definitions and usage guidance prepared by subject matter experts, a metadata dictionary is a trusted authority and one-stop reference to turn to for all your data questions.

At least, that’s the theory.  In practice, I haven’t encountered a metadata dictionary that could deliver on that promise.

And just as there are many dictionary publishers (e.g., Houghton Mifflin Harcourt, Merriam-Webster, Oxford University Press), as well as numerous online dictionaries (e.g., Collins, Urban, Wiktionary), there’s often more than one metadata dictionary within every organization as well.  In fact, sometimes the organization has just as many metadata silos as it does data silos.

An old dictionary reminds us that language — and especially its everyday usage — evolves.  An old dictionary also teaches us that metadata — and especially the data it defines, describes, and provides a context for — evolves as well.  Which is probably why doing metadata management well is not, well, something that just automagically happens.

September 05, 2018/ Jim Harris/ 4 Comments
Data Quality, Debates
Metadata, Philosophy

Jim Harris

March 06, 2014

What is Metadata?

March 06, 2014/ Jim Harris

During this short OCDQ Radio episode, special guest John Owens and I discuss the difference between metadata and data, explaining that metadata describes the context, structure, and format, whereas data describes the values.

Read More
March 06, 2014/ Jim Harris/ Comment
OCDQ Radio, Podcasts, Data Quality
Metadata, John Owens
January 28, 2014

The Two Characteristics of Data Accuracy

January 28, 2014/ Jim Harris

As Jack Olson explained in his book Data Quality: The Accuracy Dimension, in order to be accurate, data must have both the right value and be represented in an unambiguous form.

Read More
January 28, 2014/ Jim Harris/ Comment
Books, Data Quality
Accuracy, Metadata
January 02, 2014

Best OCDQ Blog Posts of 2013

January 02, 2014/ Jim Harris

A roundup of the Best OCDQ Blog posts published during 2013.

Read More
January 02, 2014/ Jim Harris/ Comment
Data Quality
Big Data, Data Governance, Master Data Management, Metadata, Philosophy
September 10, 2013

Data Quality Project Management

September 10, 2013/ Jim Harris

OCDQ Radio is an audio podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, Adam Cox and I discuss data quality project management, avoiding data quality becoming an afterthought on data integration and data migration projects, the difference and relationship between data ownership and data stewardship, regulatory requirements for data quality, and the importance of getting buy-in from business stakeholders.

Adam Cox is a data management professional with over ten years of experience working in the public and private sector in the United Kingdom (UK).  He is an experienced project and technical manager working on large-scale projects involving significant data migration and data integration.  Adam Cox is currently working for an established UK financial institution as a Data Quality Consultant, mainly on regulatory reporting projects.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.
  • The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.
  • Data Profiling Early and Often — Guest James Standen discusses data profiling concepts and practices, and how bad data is often misunderstood and can be coaxed away from the dark side if you know how to approach it.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.
September 10, 2013/ Jim Harris/ Comment
Data Quality, OCDQ Radio, Podcasts
Data Governance, Data Integration, Data Migration, Data Ownership, Data Profiling, Data Stewardship, Metadata

Jim Harris

August 06, 2013

Big Data is Just Another Brick in the Wall

August 06, 2013/ Jim Harris

The title of my recent blog post Chaos in the Big Data Brickyard made Mike Wheeler think it was a reference to the Indianapolis Motor Speedway, which is known as “The Brickyard” because it was paved entirely with bricks way back in 1909 (today, three feet of the original bricks remain at the start/finish line).  This was a reasonable assumption by Wheeler since he is a NASCAR fan (thus making his last name a great example of an aptronym) and thus prompted his blog post Yeah, But Who Won The Race?

“The term brickyard taken without any context,” Wheeler explained, “turned out to be another random brick of fact laid on an already crowded foundation.  Context is what provides relevance to facts.  Without a frame of reference into which a fact can be inserted it can easily become meaningless or, even worse, detrimental to the decision-making process.”

As usual, I agree with Wheeler (except about being a NASCAR fan — my apologies to Mike and his fellow auto racing fans).

In my post Big Data, Sporks, and Decision Frames, I blogged about how having the right decision frame (i.e., understanding the business context of a decision) is essential to whether big data and data science can provide meaningful business insight.

Additional context often missing from discussions about big data and data science is that they are not the only bricks in the yard.

Data modeling is still important and data quality still matters.  As does metadata, data management and business intelligence, data monitoring, communication, collaboration, change management and the many other aspects of data governance.

“A successful man,” David Brinkley once said, “is one who can lay a firm foundation with the bricks others have thrown at him.”  A successful big data initiative is one that can lay a firm foundation with the bricks of best practices that the data management industry has been rightfully throwing at us for a long time now.  Big data does not obviate the need for those best practices — even though it does occasionally require adapting our best practices as well as adopting new practices.

Big data is not the be all and end all, as it is sometimes overhyped, but instead, to paraphrase the great philosophers Pink Floyd:

All in all, big data is just another brick in the wall.

Related Podcasts

Clicking on the link will take you to the episode’s blog post:

  • Defining Big Data — This episode of the Open MIKE Podcast, with assistance from Robert Hillard, discusses how big data refers to big complexity, not big volume, even though complex datasets tend to grow rapidly, thus making them voluminous.
  • Too Big to Ignore — Guest Phil Simon, author of the book Too Big to Ignore: The Business Case for Big Data, offers advice on getting started with big data and remembering that big data is just another means toward solving business problems.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, experimentation, and correlation.

 

Related Posts

i blog of Data glad and big

The Laugh-In Effect of Big Data

The Need for Data Philosophers

The Graystone Effects of Big Data

The Wisdom of Crowds, Friends, and Experts

HoardaBytes and the Big Data Lebowski

Magic Elephants, Data Psychics, and Invisible Gorillas

Information Overload Revisited

Our Increasingly Data-Constructed World

Big Data and the Infinite Inbox

Little Ooches prevent Big Data Ouches

It’s Not about being Data-Driven

Data Separates Science from Superstition

Through a PRISM, Darkly

Predictive Analytics, the Data Effect, and Jed Clampett

Rage against the Machines Learning

Darth Vader, Big Data, and Predictive Analytics

The Flying Monkeys of Big Data

Big Data, Sporks, and Decision Frames

Big Data, Predictive Analytics, and the Ideal Chronicler

Tweet

August 06, 2013/ Jim Harris/ 1 Comment
Blogs, Data Quality
Big Data, Business Intelligence, Change Management, Collaboration, Communication, Data Governance, Data Science, Metadata

Jim Harris

July 09, 2013

The Assumption of Quality

July 09, 2013/ Jim Harris

In my post The Wisdom of Crowds, Friends, and Experts, I used Amazon, Facebook, and Pandora respectively as examples of three techniques used by the recommendation engines increasingly provided by websites, social networks, and mobile apps.

Richard Jarvis commented that my assessment of the data quality associated with these techniques actually needed to look at metadata, data, and information, as well as knowledge management.  “For crowd-sourced data, we’re assessing quality based on the first-order value rather than the immediate downstream usability.  We’re not questioning the accuracy of Amazon’s assertion that customers who purchased X also purchased Y.  Rather, we’re interested in the relevance of that information.  In terms of knowledge management, I would describe this as broadening data quality to embrace information and knowledge quality.”

As usual, I agreed with Richard.  Amazon is not providing access to the operational data underlying their recommendations, which is, of course, understandable, but instead Amazon is providing some aggregated information (e.g., sales rank) along with some detailed information (e.g., consumer reviews) and numerous metadata attributes (e.g., product category).

We have no way of knowing if the underlying operational data is accurate (as well as other aspects of data quality), nor do we have any way of verifying any aspect of the information quality.  Some of the metadata could be verified by cross-referencing other sources (e.g., for books, we could verify the metadata with the publishers and other sellers such as Barnes & Noble).

Making use of Amazon’s information has to be done on the assumption of quality — something that data and information quality professionals would never endorse in other contexts (e.g., within Amazon’s internal financial accounting systems).

While this situation has always existed, the Internet and the era of big data is exacerbating it.  Although this example focused on recommendation engines, many of the sources involved in big data analytics face this same challenge, such as sentiment analysis and other analysis that is dependent upon self-reported data.

Furthermore, I would argue that many, for lack of a better term, traditional data and information management applications, have functioned off of the same assumption of quality even though data and information quality best practices are implemented.

By extension, although there is an assumption that quality business decisions can only be made based on quality metadata, data, and information, if that were true in all cases, then every business would be bankrupt.

None of this is meant to imply that quality is not important.

On the contrary, my point is that in almost every application of metadata, data, and information, there is an assumption of quality.  Obviously, this assumption should be tested whenever it can be, but we have to accept the fact that there will be many times when we will not be able to, thus forcing us to leverage metadata, data, and information on the assumption of their quality.

July 09, 2013/ Jim Harris/ Comment
Data Quality, Debates
Big Data, Business Intelligence, Data Governance, Metadata, Philosophy

Jim Harris

January 03, 2013

Best OCDQ Blog Posts of 2012

January 03, 2013/ Jim Harris

Welcome to my roundup of the best blog posts published on the Obsessive-Compulsive Data Quality (OCDQ) blog during 2012.

My selections were based on a pseudo-scientific, quasi-statistical combination of page views, comments, and re-tweets, as well as choosing a few of my personal favorites, and which I have organized into four sections of ten best posts by topic or type.

 

Ten Best Posts on Big Data

  • Dot Collectors and Dot Connectors — The multifaceted challenges of big data require the dot collectors of data management and the dot connectors of business intelligence to overcome their attention blindness and work together more collaboratively.
  • HoardaBytes and the Big Data Lebowski — Don’t hoard Data, dude.  The Data must abide.  The Data must abide both the Business, by proving useful to our business activities, and the Individual, by protecting the privacy of our personal activities.
  • Magic Elephants, Data Psychics, and Invisible Gorillas — As technological advancements improve our data analytical tools, we must not lose sight of the fact that tools and data remain only as effective and beneficent as the humans who wield them.
  • Our Increasingly Data-Constructed World — What we now call Big Data is in fact a long-running macro trend underlying the many recent trends and innovations making our world, not just more data-driven, but increasingly data-constructed.
  • Will Big Data be Blinded by Data Science? — With apologies to Thomas Dolby, will the business leaders being told to hire data scientists to derive business value from big data analytics be blind to what data science tries to show them?
  • The Graystone Effects of Big Data — Using a metaphor based on the science fiction television show Caprica, I refer to the positive aspects of Big Data as the Zoe Graystone Effect, and the negative aspects of Big Data as the Daniel Graystone Effect.
  • Exercise Better Data Management — Big Data may be followed by MOData (i.e., MOre Data or Morbidly Obese Data), but that doesn’t necessarily mean we require more data management, instead we just need to exercise better data management.
  • A Tale of Two Datas — Inspired by Malcolm Chisholm and Charles Dickens, there are two types of data (i.e., representation and observation, not big and not-so-big) with different data uses that will require different data management approaches.
  • Data Silence — Not only do we need to adopt a mindset that embraces the principles of data science, but we also have to acknowledge that the biases and preconceptions in our minds could silence the signal and amplify the noise in big data.
  • The Wisdom of Crowds, Friends, and Experts — The future of wisdom will increasingly become an amalgamation of experts, friends, and crowds, with the data and techniques from all three sources often contributing to data-driven decision making.

 

Ten Best Posts on Data Governance and Data Quality

  • Data Governance Frameworks are like Jigsaw Puzzles — Inspired by Jill Dyché and Scott Berkun, this post explains how the usefulness of data governance frameworks comes from realizing data governance frameworks are like jigsaw puzzles.
  • Data Quality: Quo Vadimus? — With lots of help from Henrik Liliendahl Sørensen, Garry Ure, Bryan Larkin, and many others via the comments, I ponder where data quality is going, and whether data quality is a journey or a destination.
  • Data Quality and Miracle Exceptions — Battling the dark forces of poor data quality doesn’t require any superpowers, and data quality doesn’t have any miracle exceptions, so for the love of high-quality data everywhere, stop trying to sell us one.
  • Data Myopia and Business Relativity — Examines the two most prevalent definitions for data quality, real-world alignment and fitness for the purpose of use, otherwise known as the danger of data myopia and the challenge of business relativity.
  • How Data Cleansing Saves Lives — Although proactive defect prevention is far superior to reactive data cleansing, the history of the Hubble Space Telescope proves that data cleansing can be not just a necessary evil, but also a necessary good.
  • Data Quality and the Bystander Effect — The most common reason data quality issues are neither reported nor corrected is the Bystander Effect making people less likely to interpret bad data as a problem or, at the very least, not their responsibility.
  • Data Quality and Chicken Little Syndrome — A chicken-metaphor-based post about the far-too-common and fowl folly of, instead of trying to sell the business benefits of data quality, emphasizing the negative aspects of not investing in data quality.
  • Data and its Relationships with Quality — The metadata linking the data management industry to what it manages suffers from the one-to-many relationships created by never agreeing on how data, information, and quality should be defined.
  • Cooks, Chefs, and Data Governance — Implementing policies requires cooks who are adept at carrying out a recipe, as well as chefs who are trusted to figure out how to best combine policies with the organizational ingredients available to them.
  • Availability Bias and Data Quality Improvement — The availability heuristic explains why a reactive data cleansing project is easily approved, and availability bias explains why initiating a proactive data quality program is usually resisted.

 

Ten Best Podcasts

  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Saving Private Data — Recorded in December 2011, guest Daragh O Brien discusses the data privacy and data protection implications of social media, cloud computing, and big data.
  • Decision Management Systems — Guest James Taylor discusses data-driven decision making and analytical concepts from his book: Decision Management Systems: A Practical Guide to Using Business Rules and Predictive Analytics.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Social Media for Midsize Businesses — Sponsored by IBM Midsize Business Solutions, guest Paul Gillin, author of four books, the latest, co-authored with Greg Gianforte, is Attack of the Customers, discusses social media marketing concepts.
  • Data Driven — Guest Tom Redman (aka the “Data Doc”) discusses concepts from one of my favorite data quality books, which is his most recent book: Data Driven: Profiting from Your Most Important Business Asset.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • The Evolution of Enterprise Security — Sponsored by the Enterprise CIO Forum, guest Bill Laberis discusses striking a balance between convenience and security, which is necessary in the era of cloud computing and mobile devices.
  • Defining Big Data — This episode of the Open MIKE Podcast, with assistance from Robert Hillard, discusses how big data refers to big complexity, not big volume, even though complex datasets tend to grow rapidly, thus making them voluminous.
  • Getting to Know NoSQL — This episode of the Open MIKE Podcast discusses how NoSQL does not mean AntiSQL (i.e., NoSQL is not a Relational replacement), and that business-driven big data needs will often require “Not Only SQL.”

 

Ten Best of the Rest

  • DQ-View: Data Is as Data Does — In this short video, I explain that data’s value comes from data’s usefulness, exemplifying the potential value of unstructured data based on whether or not you put what you read in data management books to use.
  • DQ-View: The Five Stages of Data Quality — In this short video, using my superb acting skills, I demonstrate how coming to terms with the daunting challenge of data quality is somewhat similar to experiencing the Five Stages of Grief.
  • DQ-View: MetaData makes BettahMusic — In this short video, I demonstrate how better metadata makes data better using the metadata automatically and manually created after importing my CD collection into my iTunes library.
  • Metadata, Data Quality, and the Stroop Test — In this colorful (and perhaps too colorful) post, I use the Stroop Test, where colors do not match their names, to discuss the relationship between metadata and data quality.
  • Quality is the Higgs Field of Data — Using one of the biggest science stories of 2012, the potential discovery of the elusive Higgs Boson (which I also attempt to explain), I attempt an analogy for data quality based on the Higgs Field.
  • The Family Circus and Data Quality — Thanks to The Family Circus comic strip created by cartoonist Bil Keane, I explain how Ida Know owns the data, Not Me is accountable for data governance, and Nobody takes responsibility for data quality.
  • Data Love Song Mashup — Since your data needs love too, on Valentine’s Day I wrote this post providing a mashup of love songs for your data (and Rob DuMoulin added a few more in the comments) — Happy Data Quality to you and your data!
  • The Algebra of Collaboration — The trick of algebra equates collaboration with data quality and data governance success when collaboration is viewed not just as a guiding principle, but also as a call to action in your daily practices.
  • The Return of the Dumb Terminal — With help from author Kevin Kelly and my old green machine, I ponder how the mobile-app-portal-to-the-cloud computing model means mobile devices are bringing about the return of the dumb terminal.
  • An Enterprise Carol — Jacob Marley raises the ghosts of a few ideas to consider about how to keep the Enterprise well in the new year via the Ghosts of Enterprise Past (Legacy Applications), Present (IT Consumerization), and Future (Big Data).

 

Thank You for Reading OCDQ Blog in 2012

In 2012, the Obsessive-Compulsive Data Quality (OCDQ) blog published 92 posts, which received 160,000 total page views, while averaging over 400 page views and 200 unique visitors a day.

Thank you for reading OCDQ Blog in 2012.  Your readership was deeply appreciated.

 

Related Posts

Best OCDQ Blog Posts of 2011

So Long 2011, and Thanks for All the . . . – The OCDQ Radio 2011 Year in Review

2012 Quarterly Review of the Data Roundtable (Part 4)

2012 Quarterly Review of the Data Roundtable (Part 3)

2012 Quarterly Review of the Data Roundtable (Part 2)

2012 Quarterly Review of the Data Roundtable (Part 1)

2011 Quarterly Review of the Data Roundtable (Part 4)

2011 Quarterly Review of the Data Roundtable (Part 3)

2011 Quarterly Review of the Data Roundtable (Part 2)

2011 Quarterly Review of the Data Roundtable (Part 1)

Tweet

January 03, 2013/ Jim Harris/ Comment
Data Quality, Social Media
Big Data, Cloud, Collaboration, Data Governance, Data Privacy, Data Security, Master Data Management, Metadata, Mobile, Philosophy

Jim Harris

November 15, 2012

Open MIKE Podcast — Episode 07

November 15, 2012/ Jim Harris

Method for an Integrated Knowledge Environment (MIKE2.0) is an open source delivery framework for Enterprise Information Management, which provides a comprehensive methodology that can be applied across a number of different projects within the Information Management space.  For more information, click on this link: openmethodology.org/wiki/What_is_MIKE2.0

The Open MIKE Podcast is a video podcast show, hosted by Jim Harris, which discusses aspects of the MIKE2.0 framework, and features content contributed to MIKE 2.0 Wiki Articles, Blog Posts, and Discussion Forums.

 

Episode 07: Guiding Principles for the Open Semantic Enterprise

If you’re having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Open MIKE Podcast on Vimeo

 

MIKE2.0 Content Featured in or Related to this Podcast

Semantic Enterprise Guiding Principles: openmethodology.org/wiki/Guiding_Principles_for_the_Open_Semantic_Enterprise *

* Based on Mike Bergman’s article: mkbergman.com/859/seven-pillars-of-the-open-semantic-enterprise

Semantic Enterprise Composite Offering: openmethodology.org/wiki/Semantic_Enterprise_Composite_Offering

Semantic Enterprise Wiki Category: openmethodology.org/wiki/Category:Semantic_Enterprise

You can also find the videos and blog post summaries for every episode of the Open MIKE Podcast at: ocdqblog.com/MIKE

 

Related Posts

Open MIKE Podcast — Episode 04: Metadata Management

You Say Potato and I Say Tater Tot

The Metadata Continuum

The Metadata Crisis

DQ-View: MetaData makes BettahMusic

Metadata, Data Quality, and the Stroop Test

Data Quality and the Q Test

Data and its Relationships with Quality

What’s the Meta with your Data?

Let’s Meta a Data

Listen to Peter Benson discuss Metadata, Data, and Information on the Knights of the Data Roundtable

Tweet

November 15, 2012/ Jim Harris/ Comment
Data Quality, Podcasts, Sponsored Blog Posts, Videos
EIM, MIKE2.0, Metadata, Open MIKE Podcast, Semantic Enterprise

Jim Harris

September 27, 2012

Open MIKE Podcast — Episode 04

September 27, 2012/ Jim Harris

Method for an Integrated Knowledge Environment (MIKE2.0) is an open source delivery framework for Enterprise Information Management, which provides a comprehensive methodology that can be applied across a number of different projects within the Information Management space.  For more information, click on this link: openmethodology.org/wiki/What_is_MIKE2.0

The Open MIKE Podcast is a video podcast show, hosted by Jim Harris, which discusses aspects of the MIKE2.0 framework, and features content contributed to MIKE 2.0 Wiki Articles, Blog Posts, and Discussion Forums.

 

Episode 04: Metadata Management

If you’re having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Open MIKE Podcast on Vimeo

 

MIKE2.0 Content Featured in or Related to this Podcast

Information Asset Management: openmethodology.org/wiki/Information_Asset_Management_Offering_Group

Metadata Management Solution Offering: openmethodology.org/wiki/Metadata_Management_Solution_Offering

You can also find the videos and blog post summaries for every episode of the Open MIKE Podcast at: ocdqblog.com/MIKE

 

Related Posts

You Say Potato and I Say Tater Tot

The Metadata Continuum

The Metadata Crisis

DQ-View: MetaData makes BettahMusic

Metadata, Data Quality, and the Stroop Test

Data Quality and the Q Test

Data and its Relationships with Quality

What’s the Meta with your Data?

Let’s Meta a Data

Listen to Peter Benson discuss Metadata, Data, and Information on the Knights of the Data Roundtable

Tweet

September 27, 2012/ Jim Harris/ Comment
Data Quality, Podcasts, Sponsored Blog Posts, Videos
EIM, MIKE2.0, Metadata, Open MIKE Podcast

Jim Harris

August 21, 2012

Data and its Relationships with Quality

August 21, 2012/ Jim Harris
Data+and+its+Relationships.jpg

The title of this blog post is an allusion to the graphic (shown above) that accompanied an Information Management column by Malcolm Chisholm, in which he wrote that data quality is not fitness for use as it is most commonly defined, stating he thinks “a strong case can be made that the definition is indeed inappropriate and should be replaced with a better one.”

“Before we get into the definition of data quality, let us take a brief look at what data is related to,” Chisholm opened, explaining that “data represents something — a thing, event, or concept.”

As I blogged in my post Plato’s Data, whether it’s an abstract description of real-world entities (i.e., “master data”) or an abstract description of real-world interactions (i.e., “transaction data”) among entities, data is an abstract description of reality.  Although data shapes our perception of the real world, sometimes we forget that data is only a partial reflection of reality.

“Data is understood,” Chisholm continued, “by something, for which the best term I can find is the interpretant.”

“The interpretant applies the data to one or more uses, which achieve objectives the interpretant has.  The interpretant is independent of the data.  It understands the data and can put it to use.  But if the interpretant misunderstands the data, or puts it to an inappropriate use, that is hardly the fault of the data, and cannot constitute a data quality problem.”

As I blogged in my post Quality is the Higgs Field of Data, independent from use, data is as carefree as the mass-less photon whizzing around at the speed of light.  But once we interact with it, data begins to feel the effects of our use. We give data mass so that it can become the basic building blocks of what matters to us.  Some data is affected more by our use than others.  The more subjective our use, the more we weigh data down.  The more objective our use, the less we weigh data down.

“A more fundamental problem is that data can have many uses,” Chisholm continued.  “If we think data quality is fitness for use, then data quality must be assessed independently for each use we put it to.”  Instead, Chisholm contends that data quality is “an expression of the relationship between the thing, event, or concept and the data that represents it.  This is a one-to-one relationship, unlike the one-to-many relationship between data and uses.”

Therefore, Chisholm proposes that a better definition of data quality is “the extent to which the data actually represents what it purports to represent.  This definition can be used to think of data quality as a property of the data itself, and then our diagnosis and remediation efforts will focus on the special problems of the relationship between data and what it represents.”

But, of course, although Chisholm doesn’t like it as a definition for data quality, he is not denying that fitness for use describes “a set of valid concepts that deal with types of problems around the use of data.”  Two examples he cites are when the interpretant misunderstands the data, or when the interpretant uses data for a purpose that is incompatible with the data.

In his conclusion, Chisholm states that “the special problems of the relationships between data and what it is used for requires a different set of approaches and should be called something other than data quality.”

And this is exactly why, as I blogged in my post Data Myopia and Business Relativity, many data professionals prefer to define data quality as real-world alignment and information quality as fitness for the purpose of use.  However, I have found that adding the nuance of data versus information only further complicates data quality discussions with business professionals.

Chisholm also suggests that his proposed definition of data quality is not only better, but that “it also alludes to the existence of metadata that links the data to what it is representing.”  The important role that metadata plays in supporting data and its relationships with information and quality is something I blogged about in my post You Say Potato and I Say Tater Tot.

The irony is the metadata that links the data management industry to what it is representing that it manages suffers from the one-to-many relationships we’ve created by seemingly never agreeing on how data, information, and quality should be defined.

August 21, 2012/ Jim Harris/ 2 Comments
Blogs, Data Quality, Debates
Malcolm Chisholm, Master Data Management, Metadata

Jim Harris

June 25, 2012

Metadata, Data Quality, and the Stroop Test

June 25, 2012/ Jim Harris

In psychology, the Stroop Effect is a demonstration of the reaction time of a task.  The most commonly used example is what is known as the Stroop Test, which compares the time needed to name colors when they are printed in an ink color that matches their name (e.g., green, yellow, red, blue, brown, purple) with the time needed to name the same colors when they are printed in an ink color that does not match their name (e.g., blue, red, purple, green, brown, yellow).  Naming the color of the word takes longer, and is more prone to errors, when the ink color does not match the name of the color.

The Stroop Test, where colors do not match their names, reminds me of the relationship between metadata and data quality if I view the ink color as the metadata and the name of the color as the data, given that understanding data takes longer, and is more prone to errors, when the metadata does not match the data, or when the metadata is ambiguous.

Unlike the Stroop Test, where poor metadata (ink color) obfuscates good data (name of the color), data quality issues can also be caused when good metadata is undermined by poor data (e.g., data entry errors like an email address being entered into a postal address field).  And, of course, even when the entered data matches the metadata (or automatic data-to-metadata matching is enabled by drop-down boxes), more insidious data quality issues can be caused by the complex challenge of data accuracy.

Additionally, the point of view paradox can turn data quality debates about fitness for the purpose of use even more colorful than the Stroop Test, such as when data that one user sees as red and green, another user sees as crimson and chartreuse.

But hopefully we can all agree that good data quality begins with good metadata, because better metadata makes data better.

 

Related Posts

You Say Potato and I Say Tater Tot

The Metadata Continuum

The Metadata Crisis

Let’s Meta a Data

What’s the Meta with your Data?

DQ-View: MetaData makes BettahMusic

Who Framed Data Entry?

Data Quality and the Cupertino Effect

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

DQ-BE: Data Quality Airlines

Data Quality and the Q Test

Tweet

June 25, 2012/ Jim Harris/ 3 Comments
Data Quality, Debates
Accuracy, Metadata

Jim Harris

January 26, 2012

The Johari Window of Data Quality

January 26, 2012/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

The Johari Window is a term from psychology for a technique used to help people better understand their personality and behavior by combining a self assessment with assessments from their peers.  In relation to data, the Johari Window is a metaphor for helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

During this episode, I discuss the Johari Window of Data Quality with Martin Doyle.  Our discussion, inspired by our blog comment banter on my post There is No Such Thing as a Root Cause, includes root cause analysis, the pursuit of data perfection, metadata, communication, Business-IT collaboration, change management, defect prevention, and continuous improvement.

Martin Doyle is a Data Quality Improvement Evangelist and the CEO of DQ Global, which is a UK-based data quality software and services vendor providing data cleansing, international address and email verification, data deduplication, and data matching solutions for Customer Relationship Management, Single Customer View, and Master Data Management.  DQ Global has worked with over 500 businesses worldwide on a variety of projects, providing their clients with improved data quality, making their data fit for business use, and enabling them to trust their data and make decisions based on a foundation of fact.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.
  • Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.
  • The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.
  • The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.
January 26, 2012/ Jim Harris/ 1 Comment
Data Quality, Debates, OCDQ Radio, Podcasts
Business Benefits, Business-IT Collaboration, Change Management, Communication, Data Governance, Master Data Management, Metadata, Philosophy

Jim Harris

January 24, 2012

DQ-View: MetaData makes BettahMusic

January 24, 2012/ Jim Harris

Data Quality (DQ) View is an OCDQ regular segment. Each DQ-View is a brief video discussion of a data quality key concept.

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link:DQ-View on Vimeo

You can also watch a regularly updated page of my videos by clicking on this link:OCDQ Videos

Related Posts

What an Old Dictionary teaches us about Metadata

You Say Potato and I Say Tater Tot

The Metadata Continuum

The Metadata Crisis

Metadata, Data Quality, and the Stroop Test

Open MIKE Podcast — Episode 04: Metadata Management

 

Data Quality Music (DQ-Songs)

In other words, the following links are to lyrical data quality blog posts inspired by music:

Council Data Governance

Data Love Song Mashup

I’m Gonna Data Profile (500 Records)

A Record Named Duplicate

New Time Human Business

You Can’t Always Get the Data You Want

I’m Bringing DQ Sexy Back

Imagining the Future of Data Quality

The Very Model of a Modern DQ General

Metadata and the Baker/baker Paradox

Midnight in a Sky above an Ocean of Aqua

Bigger Data needs Better Metadata

Data Quality and the Q Test

The Broken Telephone of the Data Warehouse

Open MIKE Podcast — Episode 07: Open Semantic Enterprise

January 24, 2012/ Jim Harris/ 3 Comments
Data Quality, Videos
DQ-View, Metadata

Jim Harris

January 03, 2012

Best OCDQ Blog Posts of 2011

January 03, 2012/ Jim Harris

Welcome to my roundup of the best blog posts published on the Obsessive-Compulsive Data Quality (OCDQ) blog during 2011.

My selections were based on a pseudo-scientific, quasi-statistical combination of page views, comments, and re-tweets (as well as choosing a few of my personal favorites).  Instead of ordering the posts chronologically, I decided to organize them by theme.

 

The Metadata Trilogy

Although it has an incredibly important role to play in data quality and its related disciplines, I don’t write about metadata very often.  But the reader feedback that I received lead me to writing three blog posts about metadata in the span of a few weeks:

  • The Metadata Crisis — There is a running debate within many organizations over the meaning of commonly used terms, which complicates what on the surface seem like straightforward business questions.
  • The Metadata Continuum — There is a continuum, where at one end we have the uniformity of controlled vocabularies, and at the other end we have the flexibility of chaotic folksonomies.  However, both flexibility and uniformity provide value.
  • You Say Potato and I Say Tater Tot — The demarcations of the borders between metadata, data, and information are important, but sometimes difficult to discern.  In this post, I offer an explanation about these demarcations using potatoes.

 

The Data Governance Star Wars (one less than a) Trilogy

In June, Rob Karel of Forrester Research and I used a Star Wars themed blog mock debate to take on one of data governance’s biggest challenges — how to balance bureaucracy and business agility.  Gwen Thomas of the Data Governance Institute joined Rob and I to continue the discussion during a special, extended, and Star Wars themed episode of OCDQ Radio:

  • Data Governance Star Wars: Balancing Bureaucracy and Agility — In character as OCDQ-Wan, I argue in favor of business agility and explain that Collaboration is the Data Governance Force.
  • Data Governance Star Wars on OCDQ Radio — In Part 1, Rob Karel and I discuss our blog mock debate, which is followed by a brief Star Wars themed intermission, and then in Part 2, Gwen Thomas joins us to provide her excellent insights.

 

Although not Star Wars themed, here are some additional Best OCDQ Blog Posts of 2011 on the topic of data governance:

  • The Three Most Important Letters in Data Governance — There are only three letters of difference between the words cooperative and competitive, which we could say are the three most important letters in data governance.
  • Data Governance and the Adjacent Possible — It’s important to demonstrate that some data governance policies reflect existing best practices, which helps reduce resistance to change, and therefore I advise: “If it ain’t broke, bricolage it.”
  • Aristotle, Data Governance, and Lead Rulers — Well-constructed data governance policies are like lead rulers — flexible rules that empower us with an understanding of the principle of the policy, and how to enforce it in a particular context.
  • The Stakeholder’s Dilemma — There will be times when sacrifices for the long-term greater good will require that stakeholders either contribute more resources during the current phase, or receive fewer benefits from its deliverables.
  • Beware the Data Governance Ides of March — My dramatized warning about relying too much on the top-down approach to implementing data governance — and especially if your organization has any data stewards named Brutus or Cassius.
  • Data Governance and the Buttered Cat Paradox — The fearless felines of the buttered-toast-paratrooper brigade ponder how to approach data governance — top-down or bottom-up.  See the follow-up post: Zig-Zag-Diagonal Data Governance

 

OCDQ Radio

In June, I launched OCDQ Radio, which is a vendor-neutral podcast about data quality and the audio complement to this blog, providing me with a platform for recorded discussions with the great folks working in the data management industry.  So far, there have been 21 episodes of OCDQ Radio, including 22 guests from 7 countries.  Here are a few of the most popular episodes:

  • So Long 2011, and Thanks for All the . . . — The OCDQ Radio 2011 Year in Review, featuring Jarrett Goldfedder, who discusses Big Data, Nicola Askham, who discusses Data Governance, and Daragh O Brien, who discusses Data Privacy.
  • The Fall Back Recap Show — A look back at the Best of OCDQ Radio, including discussions about Data, Information, Business-IT Collaboration, Change Management, Big Analytics, Data Governance, and the Data Revolution.
  • Big Data and Big Analytics — Special Guests Jill Dyché and Dan Soceanu discuss big trends in Business Intelligence, including Cloud, Collaboration, and Big Data, the last of which lead to a discussion about Big Analytics.
  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • Making EIM Work for Business — Guest John Ladley discusses his book Making EIM Work for Business, exploring what makes information management, not just useful, but valuable to the enterprise.
  • The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”
  • Master Data Management in Practice — Guests Dalton Cervo and Mark Allen discuss their book MDM in Practice, and how to properly prepare for a new MDM program.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.
  • Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.
  • Social Media Strategy — Guest Crysta Anderson of IBM Initiate explains social media strategy and content marketing, including three recommended practices: (1) Listen intently, (2) Communicate succinctly, and (3) Have fun.

 

The Best of the Rest

  • Plato’s Data — Data shapes our perception of the real world, but sometimes we forget that data is only a partial reflection of reality.  This theme was also discussed on the OCDQ Radio episode Redefining Data Quality with Peter Perera.
  • There is No Such Thing as a Root Cause — There are no root causes, only strong correlations. And correlations are strengthened by continuous monitoring.  This post received excellent comments, including great banter with Martin Doyle.
  • You only get a Return from something you actually Invest in — Invest in doing the hard daily work of continuously improving your data quality and putting into practice your data governance principles, policies, and procedures.
  • The Dichotomy Paradox, Data Quality and Zero Defects — Has your data quality practice become motionless by trying to prove that Zero Defects is more than just theoretically possible?
  • The Data Quality Wager — Inspired by Gordon Hamilton, my rendering of Pascal’s Wager in a data quality context.
  • DQ-View: Talking about Data — DQ-View video discussion about how data professionals should talk about data when invited to participate in business discussions within their organizations.
  • The Speed of Decision — Examines the constraints that time puts on data-driven decision making, pondering whether decision speed is more important than data quality and decision quality.
  • The Data Cold War — Examines how Google and Facebook have performed the Master Data Management Magic Trick and socialized data (“Information wants to be free!”) in order to capitalize data as a true corporate asset.
  • A Farscape Analogy for Data Quality — Ponders whether data is not viewed as an asset because data has so thoroughly pervaded the enterprise that data has become invisible to those who are so dependent upon its quality.
  • No Datum is an Island of Serendip — Our organizations need to create collaborative environments that foster serendipitous connections bringing all of our business units and people together around our shared data assets.

 

Thank You for Reading OCDQ Blog in 2011

In 2011, the Obsessive-Compulsive Data Quality (OCDQ) blog published 112 posts, which received 130,000 total page views, averaging 350 page views and 150 unique visitors a day.

Thank you for reading OCDQ Blog in 2011.  Your readership was deeply appreciated.

 

Related Posts

So Long 2011, and Thanks for All the . . . – The OCDQ Radio 2011 Year in Review

2011 Quarterly Review of the Data Roundtable (Part 3)

2011 Quarterly Review of the Data Roundtable (Part 2)

2011 Quarterly Review of the Data Roundtable (Part 1)

Commendable Comments (Part 10) – The 300th OCDQ Blog Post

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

OCDQ Blog Bicentennial – The 200th OCDQ Blog Post

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

The Best Data Quality Blog Posts of 2010

Tweet

January 03, 2012/ Jim Harris/ Comment
Data Quality, Social Media
Blogging, Business Intelligence, Change Management, Cloud, Collaboration, Communication, Data Governance, Master Data Management, Metadata, Philosophy

Jim Harris

  • Next
  • Home/
  • Blog/
  • Podcast/
  • Best of OCDQ/
  • Published Articles/
  • About/
    • OCDQ
    • Jim Harris
    • Testimonials
  • Contact/
  • RSS/

OCDQ Blog

Obsessive-Compulsive Data Quality (OCDQ) is a blog offering a vendor-neutral perspective on data quality and its related disciplines.

Jim Harris

Jim Harris is the OCDQ Blogger-in-Chief.

Home Blog Podcast Videos Best of OCDQ Published Articles About Contact

© 2022, Jim Harris.

Powered by Squarespace