Data is one of the enterprise's most important assets. Data quality is a fundamental success factor for the decision-critical information that drives the tactical and strategic initiatives essential to the enterprise's mission to survive and thrive in today's highly competitive and rapidly evolving marketplace.
When the results of these initiatives don't meet expectations, analysis often reveals poor data quality is a root cause. Projects are launched to understand and remediate this problem by establishing enterprise-wide data quality standards.
However, a common issue is a lack of understanding about what I refer to as the Data-Information Continuum.
The Data-Information Continuum
In physics, the Space-Time Continuum explains that space and time are interrelated entities forming a single continuum. In classical mechanics, the passage of time can be considered a constant for all observers of spatial objects in motion. In relativistic contexts, the passage of time is a variable changing for each specific observer of spatial objects in motion.
Data and information are also interrelated entities forming a single continuum. It is crucial to understand how they are different and how they relate. I like using the Dragnet definition for data – it is “just the facts” collected as an abstract description of the real-world entities that the enterprise does business with (e.g. customers, vendors, suppliers).
A common data quality definition is fitness for the purpose of use. A common challenge is data has multiple uses, each with its own fitness requirements. I like to view each intended use as the information that is derived from data, defining information as data in use or data in action.
Data could be considered a constant while information is a variable that redefines data for each specific use. Data is not truly a constant since it is constantly changing. However, information is still derived from data and many different derivations can be performed while data is in the same state (i.e. before it changes again).
Quality within the Data-Information Continuum has both objective and subjective dimensions.
Objective Data Quality
Data's quality must be objectively measured separate from its many uses. Enterprise-wide data quality standards must provide a highest common denominator for all business units to use as an objective data foundation for their specific tactical and strategic initiatives. Raw data extracted directly from its sources must be profiled, analyzed, transformed, cleansed, documented and monitored by data quality processes designed to provide and maintain universal data sources for the enterprise's information needs. At this phase, the manipulations of raw data by these processes must be limited to objective standards and not be customized for any subjective use.
Subjective Information Quality
Information's quality can only be subjectively measured according to its specific use. Information quality standards are not enterprise-wide, they are customized to a specific business unit or initiative. However, all business units and initiatives must begin defining their information quality standards by using the enterprise-wide data quality standards as a foundation. This approach allows leveraging a consistent enterprise understanding of data while also deriving the information necessary for the day-to-day operation of each business unit and initiative.
A “Single Version of the Truth” or the “One Lie Strategy”
A common objection to separating quality standards into objective data quality and subjective information quality is the enterprise's significant interest in creating what is commonly referred to as a single version of the truth.
However, in his excellent book Data Driven: Profiting from Your Most Important Business Asset, Thomas Redman explains:
“A fiendishly attractive concept is...'a single version of the truth'...the logic is compelling...unfortunately, there is no single version of the truth.
For all important data, there are...too many uses, too many viewpoints, and too much nuance for a single version to have any hope of success.
This does not imply malfeasance on anyone's part; it is simply a fact of life.
Getting everyone to work from a single version of the truth may be a noble goal, but it is better to call this the 'one lie strategy' than anything resembling truth.”
There is a significant difference between data and information and therefore a significant difference between data quality and information quality. Many data quality projects are in fact implementations of information quality customized to the specific business unit or initiative that is funding the project. Although these projects can achieve some initial success, they encounter failures in later iterations and phases when information quality standards try to act as enterprise-wide data quality standards.
Significant time and money can be wasted by not understanding the Data-Information Continuum.