Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments
« The Big Data Collider | Main | The Data Encryption Keeper »
Thursday
Oct272011

The Metadata Crisis

I am reading the book The Information: A History, a Theory, a Flood by James Gleick, which recounts a dialogue written by the ancient Chinese philosopher Gongsun Long known as When a White Horse is Not a Horse:

“Horses certainly have color.  Hence, there are white horses.  If it were the case that horses had no color, there would simply be horses, and then how could one select a white horse?  And so it follows that a horse and a white horse are different.  Hence, I say that a white horse is not a horse.

Furthermore, a white horse is a horse and white, but horse is that by means of which one names the shape, and white is that by means of which one names the color.  What names the color is not what names the shape.  Hence, I say that a white horse is not a horse.”

“On its face, this is unfathomable,” explained Gleick, “but it begins to come into focus as a statement about language and logic.  Paradoxes like this formed part of what Chinese historians called the language crisis, a running debate over the nature of language.  Names are not the things they name.”

One of my favorite topics is how data is not the real world it describes.  But perhaps a better data management example of how “names are not the things they name” is metadata, which Julie Hunt blogged about in her post Stumbling Over Metadata, which explored better definitions than the oversimplified “metadata is data about data.”

Metadata can be thought of as a label that provides a definition, description, and context for data.  Common examples include relational table definitions and flat file layouts.  More detailed examples of metadata include conceptual and logical data models.

Therefore, metadata—among its many other uses—often plays an integral role in determining your data usage.  Although it’s often overlooked, there is a strong relationship between metadata and data quality, and by extension, between metadata and data-driven decision making, since a business intelligence report’s metadata often provides the framing effect for its data.

I have often witnessed what could be called the metadata crisis, a running debate within many organizations over the meaning of commonly used terms like revenue, which complicates what on the surface seem like straightforward business questions, such as how much revenue was generated during a particular fiscal reporting period.

A metadata management version of When a White Horse is Not a Horse might be When Recognized Revenue is Not Revenue.

However, the complexities of revenue recognition probably pale in comparison with the metadata crisis that can be caused by what David Loshin calls the most dangerous question in data management: What is the definition of customer?

What examples of the metadata crisis have you encountered in your organization?

 

Related Posts

What’s the Meta with your Data?

Let’s Meta a Data

The First Law of Data Quality

Data Quality and the Cupertino Effect

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

Plato’s Data

Data, Information, and Knowledge Management

The Data Cold War

The Semantic Future of MDM

OCDQ Radio - Master Data Management in Practice

OCDQ Radio - A Brave New Data World

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (4)

This is the most sensible thing I’ve ever read about metadata. Like many of the latest business intelligence buzzwords, there is no universally agreed upon definition of "metadata". And there is certainly no standard way of tracking it. I guess these things are destined to drive technical people crazy, because management wants it, but they don’t know what it really is.

October 27, 2011 | Unregistered CommenterBill Merrill

From the LinkedIn Group for DAMA International, Rodger Nixon commented:

“Greek debt is a crisis.

Arguing whether a Client is a Customer is a way to fill in time. Any answer will do, just so long as you get everyone to agree.”

And Jan Kamil responded:

“Rodger, Thanks for bringing some valid perspective. While we all may feel passionate and even enjoy data management, it doesn’t quite rise to the level of curing cancer or even collecting trash.”

And Rodger Nixon responded:

“Jim raises a valid point that agreement on terminology is important especially in the context of business intelligence.

However, having worked for many of our largest corporations over the years I have yet to see non-IT people do anything other than yawn when told by Data Architects it is important.

Even if they are persuaded, it won’t be to the degree that will result in the appropriate level of resources being devoted to solving the problem. Solving the problem requires both top-down and more importantly bottom-up data inventory development. It is hugely resource intensive (e.g., it would involve reverse engineering and analyzing 4,000 databases just at one of these organizations). The knee jerk "introduce data governance" response looks like something is being done but 95% of these efforts are bound to fail. You can’t govern the existence of something which you have not documented.”

And Peter Benson responded:

“Mention the word metadata and you have immediately lost all but the hard core techies and they have neither the authority nor the budget to solve the problem. If you take a hard look at the financial crisis or cancer research you will indeed find the the reason the challenges are so difficult to solve is in large part because of the limitations in our ability to communicate effectively and the lack of transparency that comes from poor data integration.

So yes I agree with Jim, metadata is really important.

The Babel approach of a single language to unite them all has a very poor track history and there is good reason for this. Language is more about power and authority then it is about true communication, the term Governance regrettably can carry the same connotation: power, control, central authority.

We have tried to come up with a solution that is solely focused on achieving unambitious communication and Rodger, you were closer than you may have thought. It really does not matter what it is called as long as we agree on what "it" is. We do this by using terminology to define concepts and then assigning concept identifiers that are used as metadata. The separation of the terminology from the concept identifier or rather linking terminology through a concept identifier allows everyone to remain comfortably in their own space yet communicate with others.

If you are looking for a public repository of metadata, at the last count the eOTD had on the order of 1.9 million defined concepts in 22 languages. The concept identifiers are in the public domain to avoid the creation of joint copyright which is the result of using proprietary metadata.”


From the LinkedIn Master Data Management Interest Group, Saad Husain commented:

“Yes, I agree. Metadata is what defines and gives your data context. If you don’t know what your data means, then your data is useless and risky to rely on. As an example, take the NASA Mars expedition which failed because one team was measuring distance in kilometers and the other in miles. Even, suppose they were using the same units, elevation distance could mean distance from the center of the planet, distance from the average surface, distance at that exact point over the surface. Precise definitions, like good fences, make good demarcation points of data exchange.”

And Randy Shepherd commented:

“Part of the issue is that many companies have not decided how far down to drill on the use of meta data. Some just track the data element's ownership, field length, and acceptable characters; while others include definitions and/or CRUD info.

I think part of the issue when using definitions is to define it for the system and not the company's jargon. "Customer", "Week" and "Sales" can vary from one system to the next, or even in the same system on different reports. Sun.-Sat. could be a week in one system, while it is Mon.-Sun. in another. Are customers everyone who have products shipped to them vs. everyone who pays for products (what about charities). Are sales calculated on orders taken, orders shipped, orders paid for (where does consignment items shipped fall in your definition. A profit report might want to call sales everything shipped, or order taken (not out of stock), while a commission report would probably look at orders paid.

The metadata crisis talked about the data of white horse, which I think was a good example of the need to track the role the data plays. Does the data element classify, identify, or describe the record (this can change from system to system as well).

Meta Data is the basic documentation of the logic tags available to help filter for desired record(s). The greater the known attributes, the greater flexibility in the use of its filtering use.”

And Molefi Moeketsi commented:

“Every organization talks about the importance of data. As excellently said, Metadata is a critical issue because without context your data is worthless. I think Metadata Management should be a cultivated culture in organizations and should become a habit embedded in designs of systems including databases. I also think it is crucial towards retaining knowledge in the company as staff leaves (due to promotions or jumping ship).

When Metadata is properly defined and managed it is easy for the next human resource to pick up and continue with the work. I absolutely agree with the statement by Randy "..The greater the known attributes, the greater flexibility in use of its filtering". In my organization we are embarking on both Business Definitions and Metadata Management and it is exciting things we are discovering that we were not aware of.

The beauty of it all is when you see lineage from business to technical Metadata, it's gold.”

And Mazhar LeGhari commented:

“Organizations and Industries that are going through rapid changes in their business models are the likely candidate to come to terms with their Business Terms and Semantics. The impact of information asymmetries between business units creates invisible silos that are equally disruptive along with data silos.”


From the LinkedIn Group for the IAIDQ Information/Data Quality Professional Open Community, David Ho commented:

“Great write-up on this often overlooked topic - metadata. I sense that most folks out there still don't have a good grasp and acknowledge its importance.

This is the critical point you raised: Metadata—among its many other uses—often plays an integral role in determining your data usage. Although it’s often overlooked, there is a strong relationship between metadata and data quality, and by extension, between metadata and data-driven decision making, since a business intelligence report’s metadata often provides the framing effect for its data.

Without the use of metadata, my data quality practice would virtually be unachievable! I rely heavily on metadata to provide the quickest and most efficient (Lean) method to assess, qualify and resolve my data issues.

Metadata is an essential component of a sound data quality practice.”

October 27, 2011 | Registered CommenterJim Harris

Hi Jim,

Funny you mention metadata. Just yesterday I spoke with a Business Information Manager of a large (to our Dutch standards) media firm that owns dozens of titles on- and off-line.

They search for ways of linking the data, preferably the customer profiles, so they can offer a very "rich" profile to their advertising clientele. My answer was: "were I to have my BI Architect hat on, I would advise a prominent place for data architecture of course. However, I'm wearing my data cap. Therefore, I suggest: metadata."

When you're not able to tell if you can link data items from different departments and even companies, the priority is to become able to do so. Nothing beats good metadata. It makes data richer, more usable, actionable, and thus more valuable. Adding metadata to data adds value, especially informational value.

It can get even better. Good metadata, enhanced with business rules, related to processes, and containing quality indicators, can be the basis of model-driven development of functions.

At least it can show you (information catalog) what you are able to do with the data. And I'm convinced that's just what this media firm needs.

I have a feeling Jim, that if we would think a little more out of the box about metadata, there would open up a whole new world of possibilities you and I have never imagined.

October 27, 2011 | Unregistered CommenterFrank Harland

Check out the great comments that this blog post received from its syndication on Information Management:

The Metadata Crisis on Information Management

October 28, 2011 | Registered CommenterJim Harris

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>