Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments
« Wordless Wednesday: January 12, 2011 | Main | Listening and Broadcasting »
Sunday
Jan092011

DQ-BE: Single Version of the Time

Data Quality By Example (DQ-BE) is an OCDQ regular segment that provides examples of data quality key concepts.

Photo via Flickr by: Leo Reynolds

Like truth, beauty, and singing ability, data quality is in the eyes of the beholder.

Data’s quality is determined by evaluating its fitness for the purpose of use.  However, in the vast majority of cases, data has multiple uses, and data of sufficient quality for one use may not be of sufficient quality for other uses.

Therefore, to be more accurate, data quality is in the eyes of the user.

The perspective of the user provides a relative context for data quality.  Many argue an absolute context for data quality exists, one which is independent of the often conflicting perspectives of different users.

This absolute context is often referred to as a “Single Version of the Truth.”

As one example of the challenges inherent in this data quality key concept, let’s consider if there is a “Single Version of the Time.”

 

Single Version of the Time

I am writing this blog post at 10:00 AM.  I am using time in a relative context, meaning that from my perspective it is 10 o’clock in the morning.  I live in the Central Standard time zone (CST) of the United States. 

My friend in Europe would say that I am writing this blog post at 5:00 PM.  He is also using time in a relative context, meaning that from his perspective it is 5 o’clock in the afternoon.  My friend lives in the Central European time zone (CET).

We could argue that an absolute time exists, as defined by Coordinated Universal Time (UTC).  Local times around the world can be expressed as a relative time using positive or negative offsets from UTC.  For example, my relative time is UTC-6 and my friend’s relative time is UTC+1.  Alternatively, we could use absolute time and say that I am writing this blog post at 16:00 UTC.

Although using an absolute time is an absolute necessity if, for example, my friend and I wanted to schedule a time to have a telephone (or Skype) discussion, it would be confusing to use UTC when referring to events relative to our local time zone.

In other words, the relative context of the user’s perspective is valid and an absolute context independent of the perspectives of different users is also valid—especially whenever a shared perspective is necessary in order to facilitate dialogue and discussion.

Therefore, instead of calling UTC a Single Version of the Time, we could call it a Shared Version of the Time and when it comes to the data quality concept of a Single Version of the Truth, perhaps it’s time we started calling it a Shared Version of the Truth.

 

Related Posts

Single Version of the Truth

The Quest for the Golden Copy

Beyond a “Single Version of the Truth”

The Idea of Order in Data

DQ-BE: Data Quality Airlines

DQ-Tip: “There is no such thing as data accuracy...”

Data Quality and the Cupertino Effect

DQ-Tip: “Data quality is primarily about context not accuracy...”

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (7)

Jim,

Thank you so much for this post. I applaud you for bringing this topic up.

This has been one of my pet peeves for a long time. Shared version of truth or the reference version of truth is so much better, friendly and non-dictative (if such a word exists) than single version of truth.

I truly believe that starting a discussion with “Single Version of the Truth” with business stakeholders is a nonstarter. There is always, and will be a need for multifaceted view and possibly multiple aspects of the truth.

A very common term/example I have come across is the usage of the term revenue. Unfortunately, there is no single version of revenue across the organizations (and for valid reasons). From Sales Management prospective, they like to look at sales revenue (sales bookings) which is the business on which they are compensated on, financial folks want to look at financial revenue, which is the revenue they capture in the books and marketing possibly wants to look at marketing revenue (sales revenue before the discount) which is the revenue marketing uses to justify their budgets. So if you ever asked questions to a group of people about what revenue of the organization is, you will get three different perspectives. And these three answers will be accurate in the context of three different groups.

Vish Agashe

January 9, 2011 | Unregistered CommenterVish Agashe

Nice post Jim, and to add even more diversity and confusion to the time issue:

In most places in the Central European time zone, you would say 5 o’clock in the afternoon and then write “17:00”.

Great post, Jim. I often think about this as I'm running to catch a 5:17 train and pass by a bank clock that says 5:18, while my watch says 5:15. The commuter line operates on its own version of the truth and leaves promptly at 5:17 Metra Time, which seems to be about a minute ahead of AT&T cell phone time. Each of these clocks keeps time accurately (we assume), but they operate in slightly different realms.

So who's right? If I consider my cell phone to be the absolute source of true, accurate time, I'll miss my train. And then I'll have plenty of time to sit and think about it.

January 10, 2011 | Unregistered CommenterCrysta Anderson

I think there are at least two different issues that could be recognized with respect to your post:

1) How would multiple "users" of a fact interpret that same fact (and how would they represent that same fact? - This seems to me the issue when you are talking about different time zones. And as long as the time-zone context of the data source and the data user is available and is figured into the access, then everything is fine. Since you already discussed this issue, I'll focus on the other:

2) What is the recognized "trusted source" for that fact? - Here, there might be a number of different sources for the "current time" - and even the notion of "quality" can be context-dependent. There might be some circle of data users that need an absolute and accurate time, and they have access to an atomic clock and the means to link that time to whatever systems that need it. Within that circle, that clock is the trusted source, and nothing short of that is acceptable as quality data about the current time. Now, imagine another circle of data users that, for whatever reason, decided that "Joe's clock" was going to be their definitive and trusted source for the time. For that circle of users, only Joe's clock is a trusted source for that data, and the fact that Joe's clock is 5 minutes fast and gaining 1 second each day (relative to the earlier-mentioned atomic clock) is irrelevant to that circle of users. These two are examples of "shared versions of the 'truth'". Which is correct? Each is, in its own context, and if you try to force one group to accept the other's source, there will be pain.

I have a problem with the use of the word "truth" in these conversations, because I think it clouds the picture with philosophical ideas. Let's just assume that what we really mean is a recognized, trusted fact (and the implication is that someone or some group is responsible for determining what is recognized and trusted for that fact).

Now, the data quality question can be viewed from two perspectives:

1) From the "now" perspective, which data source is accepted as the trusted source for that data, and does the path from that source to its use introduce anything that might affect the data's validity or acceptability?

2) From the "looking back" perspective, when that "fact" has been recorded, has it been accurately recorded, and have all necessary precautions been taken to ensure that the "fact" is not taken out of its intended context?

So, let's go back to our two examples - the absolute atomic clock and Joe's clock. For the group that accepts the atomic clock as their trusted source of the time, any recorded times based on an accurate recording of the atomic clock are valid, quality data. Likewise for the Joe's-clock-group and Joe's clock. But if you do any operation that mixes the data from both, then that operation spoils the data, because it ignores the context constraints.

January 10, 2011 | Unregistered CommenterEric Aranow

Thanks for your comments Vish, Henrik, Crysta, and Eric.


From the LinkedIn Group for the Data Governance Professionals Organization, Ian Rowlands commented:

“I think this translates into something simpler. Perspectives legitimately vary according to the task in hand. The task in hand may be of local interest only, or broader.

There are very few instances in which an absolute view of the time is required (if indeed such a thing exists ... number of agreed time units since creation of the universe ;-)). There are instances in which a UNIFORM view of the time is required. For example, the day and time which constitutes month end for the purposes of closing the books across all business units.”

January 10, 2011 | Registered CommenterJim Harris

Jim,

Upon reading your post, I thought to myself: "I absolutely agree with everything you say."

Then I got to thinking: "I absolutely disagree with everything you say." So, as usual, the truth (pun intended) lies somewhere in the middle, much of which has percolated through the discussions on your post.

By my reckoning, there has to be a source of Truth (or if that term bothers you, a source of data that is, within reason, accurate, clean, deduped, etc.) For the purposes of shorthand, I will call this the Operational Data Store (ODS). However, the way this truth is viewed can, legitimately, vary across applications and organizations. For the purposes of shorthand, I'll call these views Data Marts.

As, I've grown my data management approach, the ODS, as vitally important as it is, is a bit esoteric (it's not quite the word I want, but it will suffice.) It is something that I can discuss in detail with technical personnel and data-savvy business owners, but rarely something I go into any great detail with when consulting with less technical users. The reason for that, as your post points out, is that an ODS decouples fact and context.

That's where Data Marts come in. They take the raw data from the ODS, add the context, and apply any domain-specific business rules to create un-Shared views of the data which nonetheless are derived from the truth data.

To illustrate, we'll consider Vish's example of "revenue" and how it can mean different things to different parts of the business. Taking a step back, no matter how that revenue is treated by various departments, at some point an order was placed, money changed hands, etc. These are facts, which I would then store in the ODS along with references to their context. However, going back to my idea that an ODS is esoteric, the way I store this fact is not going to be particularly useful to my clients (be they internal or external) because my ODS doesn't care about context. It is more concerned with efficiency, scalability, accuracy, etc.

From that ODS, then I build a Finance Mart, a Sales Mart, a Marketing Mart, etc.

Same starting points, different views of the data.

In this way, I start from a Shared Version of the Truth and narrow it down into multiple, context-driven views which are unshared because their different contexts and appeal to different audiences.

January 10, 2011 | Unregistered CommenterChris Perrin

As I read this post I thought about the typical MDM project where data ownership is bickered over. Instead of establishing one owner maybe we just all need to learn how to share? Interesting and thought provoking, as usual!

January 11, 2011 | Unregistered CommenterWilliam Sharp

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>