Recent Comments
Affiliate Links
« Not So Strange Case of Dr. Technology and Mr. Business | Main | The Three Musketeers of Data Quality »
Saturday
20Jun2009

The Data-Information Continuum

Data is one of the enterprise's most important assets.  Data quality is a fundamental success factor for the decision-critical information that drives the tactical and strategic initiatives essential to the enterprise's mission to survive and thrive in today's highly competitive and rapidly evolving marketplace.

When the results of these initiatives don't meet expectations, analysis often reveals poor data quality is a root cause.   Projects are launched to understand and remediate this problem by establishing enterprise-wide data quality standards.

However, a common issue is a lack of understanding about what I refer to as the Data-Information Continuum.

 

The Data-Information Continuum

In physics, the Space-Time Continuum explains that space and time are interrelated entities forming a single continuum.  In classical mechanics, the passage of time can be considered a constant for all observers of spatial objects in motion.  In relativistic contexts, the passage of time is a variable changing for each specific observer of spatial objects in motion.

Data and information are also interrelated entities forming a single continuum.  It is crucial to understand how they are different and how they relate.  I like using the Dragnet definition for data – it is “just the facts” collected as an abstract description of the real-world entities that the enterprise does business with (e.g. customers, vendors, suppliers). 

A common data quality definition is fitness for the purpose of use.  A common challenge is data has multiple uses, each with its own fitness requirements.  I like to view each intended use as the information that is derived from data, defining information as data in use or data in action.

Data could be considered a constant while information is a variable that redefines data for each specific use.  Data is not truly a constant since it is constantly changing.  However, information is still derived from data and many different derivations can be performed while data is in the same state (i.e. before it changes again). 

Quality within the Data-Information Continuum has both objective and subjective dimensions.

 

Objective Data Quality

Data's quality must be objectively measured separate from its many uses.  Enterprise-wide data quality standards must provide a highest common denominator for all business units to use as an objective data foundation for their specific tactical and strategic initiatives.  Raw data extracted directly from its sources must be profiled, analyzed, transformed, cleansed, documented and monitored by data quality processes designed to provide and maintain universal data sources for the enterprise's information needs.  At this phase, the manipulations of raw data by these processes must be limited to objective standards and not be customized for any subjective use.

 

Subjective Information Quality

Information's quality can only be subjectively measured according to its specific use.  Information quality standards are not enterprise-wide, they are customized to a specific business unit or initiative.  However, all business units and initiatives must begin defining their information quality standards by using the enterprise-wide data quality standards as a foundation.  This approach allows leveraging a consistent enterprise understanding of data while also deriving the information necessary for the day-to-day operation of each business unit and initiative.

 

A “Single Version of the Truth” or the “One Lie Strategy”

A common objection to separating quality standards into objective data quality and subjective information quality is the enterprise's significant interest in creating what is commonly referred to as a single version of the truth.

However, in his excellent book Data Driven: Profiting from Your Most Important Business Asset, Thomas Redman explains:

“A fiendishly attractive concept is...'a single version of the truth'...the logic is compelling...unfortunately, there is no single version of the truth. 

For all important data, there are...too many uses, too many viewpoints, and too much nuance for a single version to have any hope of success. 

This does not imply malfeasance on anyone's part; it is simply a fact of life. 

Getting everyone to work from a single version of the truth may be a noble goal, but it is better to call this the 'one lie strategy' than anything resembling truth.”

Conclusion

There is a significant difference between data and information and therefore a significant difference between data quality and information quality.  Many data quality projects are in fact implementations of information quality customized to the specific business unit or initiative that is funding the project.  Although these projects can achieve some initial success, they encounter failures in later iterations and phases when information quality standards try to act as enterprise-wide data quality standards. 

Significant time and money can be wasted by not understanding the Data-Information Continuum.

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (3)

I agree with the views in this article.

Many times we lose perspective of Information Quality (Subjective) vs. Data Quality (Objective).

I would extend discussion around data quality to really get into and address process issues which manifest as data quality issues. In my experience, a lot of data quality issues stem from lack of business process enforcement (or lack of business process in first place). Addressing data quality initiatives should include addressing business process enforcement (and definition) issues as well.

Without having a strategy to implement business process or enforce business process as a part of a data quality initiative, it will not make investments in data quality projects go far enough.

June 21, 2009 | Unregistered CommenterVishAgashe

This article is intriguing. I would add more still.

A most significant quote (4th para. Section 'Data-Information Continuum'): "Data could be considered a constant while Information is a variable that redefines data for each specific use."

This tells us that Information draws from a snapshot of a Data store. I would state further that the very Information [specification] is - in itself - a snapshot.

The earlier quote continues: "Data is not truly a constant since it is constantly changing."

Similarly, it is a business reality that "Information is not truly a constant since it is constantly changing."

The article points out that 'The Data-Information Continuum' implies a many-to-many relationship between the two. This is a sensible CONCEPTUAL model.

Enterprise Architecture is concerned as well with its responsibility for application quality in service to each Business Unit/Initiative.

For example, in the interest of quality design in Application Architecture, an additional LOGICAL model must be maintained between a then-current Information requirement and the particular Data (snapshots) from it draws. [Snapshot: generally understood as captured and frozen - and uneditable - at a particular point in time.] Simply put, Information Snapshots have a PARENT RELATIONSHIP to the Data Snapshots from which they draw.

Analyzing this further, refer to this further piece of quoted wisdom (1st para. Section 'Subjective Information Quality'): "...business units and initiatives must begin defining their Information...by using...Data...as a foundation...necessary for the day-to-day operation of each business unit and initiative."

From Logically-related snapshots of Information to the Data from which it draws, we can see from this quote that yet another PARENT/CHILD relationship exists ... that from Business Unit/Initiative Snapshots to the Information Snapshots that implement whatever goals are the order of the day. But days change.

If it is true that "Data is not truly a constant since it is constantly changing," and if we can agree that Information is not truly a constant either, then we can agree to take a rational and profitable leap to the truth that neither is a Business Unit/Initiative... since these undergo change as well, though they represent more slow-changing dimensions.

Enterprises have an increasing responsibility for regulatory/compliance/archival systems that will qualitatively reproduce the ENTIRE snapshot of a particular operational transaction at any given point in time.

Thus, the Enterprise Architecture function has before it a daunting task: i.e. to devise a holistic process that can SEAMLESSLY model the correct relationship of snapshots between Data (grandchild), Information (parent) and Business Unit/Initiative (grandparent).

There need be no conversion programs or redundant, throw-away data structures contrived to bridge the present gap. The ability to capture the activities resulting from the undeniable point-in-time hierarchy among these entities is where tremendous opportunities lie.

June 24, 2009 | Unregistered CommenterDiane Neville

Great article. The distinction between Data Quality (objective) and Information Quality (subjective) is particularly well-stated. I'd like to expand on that distinction.

Data have no implicit business value without some context within which to relate them to the environment in which the business operates. Out of context, we can measure data quality only by objective criteria. Add relevant context, and you get information. In information systems, such context commonly resides in metadata.

Context domains (known as Dimensions in data warehousing) are manageable slices of the business environment. Determining which context domains are relevant to each fact is where the subjective criteria come in; this is a major component of data modeling and metadata management.

The business value of data (not the same as data quality!) is proportional to the richness of context available for the data. Some metrics for this are the number of context domains (dimensions) available, the depth of structure (navigation) available in each context domain, and the number of relations (potential and actual, implicit and explicit) available between context domains. These are all quantifiable, but they measure aspects of the data model that are developed based on subjective criteria.

The business value of metadata is proportional to the shareability of the metadata. Some aspects of this are broad relevance (usefulness to multiple audiences), optimal granularity (level of detail), and commonly understood representation (specification languages). These are all hard to quantify because they're completely subjective.

June 26, 2009 | Unregistered CommenterDean Groves

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>