August 02, 2011

Are you turning Ugly Data into Cute Information?

August 02, 2011/ Jim Harris

Sometimes the ways of the data force are difficult to understand precisely because they are sometimes difficult to see.

Daragh O Brien and I were discussing this recently on Twitter, where tweets about data quality and information quality form the midi-chlorians of the data force. Share disturbances you’ve felt in the data force using the #UglyData and #CuteInfo hashtags.

Presentation Quality

Perhaps one of the most common examples of the difference between data and information is the presentation layer created for business users. In her fantastic book Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information, Danette McGilvray defines Presentation Quality as “a measure of how information is presented to, and collected from, those who utilize it. Format and appearance support appropriate use of the information.”

Tom Redman emphasizes the two most important points in the data lifecycle are when data is created and when data is used.

I describe the connection between those two points as the Data-Information Bridge. By passing over this bridge, data becomes the information used to make the business decisions that drive the tactical and strategic initiatives of the organization. Some of the most important activities of enterprise data management actually occur on the Data-Information Bridge, where preventing critical disconnects between data creation and data usage is essential to the success of the organization’s business activities.

Defect prevention and data cleansing are two of the required disciplines of an enterprise-wide data quality program. Defect prevention is focused on the moment of data creation, attempting to enforce better controls to prevent poor data quality at the source. Data cleansing can either be used to compensate for a lack of defect prevention, or it can be included in the processing that prepares data for a specific use (i.e., transforms data into information fit for the purpose of a specific business use.)

The Dark Side of Data Cleansing

In a previous post, I explained that although most organizations acknowledge the importance of data quality, they don’t believe that data quality issues occur very often because the information made available to end users in dashboards and reports often passes through many processes that cleanse or otherwise sanitize the data before it reaches them.

ETL processes that extract source data for a data warehouse load will often perform basic data quality checks. However, a fairly standard practice for “resolving” a data quality issue is to substitute either a missing or default value (e.g., a date stored in a text field in the source, which can not be converted into a valid date value, is loaded with either a NULL value or the processing date).

When postal address validation software generates a valid mailing address, it often does so by removing what it considers to be “extraneous” information from input address fields, which may include valid data accidentally entered in the wrong field, or that was lacking its own input field (e.g., e-mail address in an input address field deleted from the output valid mailing address).

And some reporting processes intentionally filter out “bad records” or eliminate “outlier values.” This happens most frequently when preparing highly summarized reports, especially those intended for executive management.

These are just a few examples of the Dark Side of Data Cleansing, which can turn Ugly Data into Cute Information.

Has your Data Quality turned to the Dark Side?

Like truth, beauty, and singing ability, data quality is in the eyes of the beholder, or since data quality is most commonly defined as fitness for the purpose of use, we could say that data quality is in the eyes of the user. But how do users know if data is truly fit for their purpose, or if they are simply being presented with information that is aesthetically pleasing for their purpose?

Has your data quality turned to the dark side by turning ugly data into cute information?

Data, Information, and Knowledge Management

Beyond a “Single Version of the Truth”

The Data-Information Continuum

The Circle of Quality

Data Quality and the Cupertino Effect

The Idea of Order in Data

Hell is other people’s data

OCDQ Radio - Organizing for Data Quality

The Reptilian Anti-Data Brain

Amazon’s Data Management Brain

Holistic Data Management (Part 3)

Holistic Data Management (Part 2)

Holistic Data Management (Part 1)

OCDQ Radio - Data Governance Star Wars

Data Governance Star Wars: Bureaucracy versus Agility

June 28, 2011

Data Governance Star Wars

June 28, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

WindowsLiveWriter-DataGovernanceStarWars_728F-

Shown above is the poll results from the recent Star Wars themed blog debate about one of data governance’s biggest challenges, how to balance bureaucracy and business agility. Rob Karel took the position for Bureaucracy as Darth Karel of the Empire, and I took the position for Agility as OCDQ-Wan Harris of the Rebellion.

However, this was a true debate format where Rob and I intentionally argued polar opposite positions with full knowledge that the reality is data governance success requires effectively balancing bureaucracy and business agility.

Just in case you missed the blog debate, here are the post links:

On this special, extended, and Star Wars themed episode of OCDQ Radio, I am joined by Rob Karel and Gwen Thomas to discuss this common challenge of effectively balancing bureaucracy and business agility on data governance programs.

Rob Karel is a Principal Analyst at Forrester Research, where he serves Business Process and Applications Professionals. Rob is a leading expert in how companies manage data and integrate information across the enterprise. His current research focus includes process data management, master data management, data quality management, metadata management, data governance, and data integration technologies. Rob has more than 19 years of data management experience, working in both business and IT roles to develop solutions that provide better quality, confidence in, and usability of critical enterprise data.

Gwen Thomas is the Founder and President of The Data Governance Institute, a vendor-neutral, mission-based organization with three arms: publishing free frameworks and guidance, supporting communities of practitioners, and offering training and consulting. Gwen also writes the popular blog Data Governance Matters, frequently contributes to IT and business publications, and is the author of the book Alpha Males and Data Disasters: The Case for Data Governance.

This extended episode of OCDQ Radio is 49 minutes long, and is divided into two parts, which are separated by a brief Star Wars themed intermission. In Part 1, Rob and I discuss our blog debate. In Part 2, Gwen joins us to provide her excellent insights.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

June 09, 2011

Data Governance Star Wars: Balancing Bureaucracy and Agility

June 09, 2011/ Jim Harris

I was recently discussing data governance best practices with Rob Karel, the well respected analyst at Forrester Research, and our conversation migrated to one of data governance’s biggest challenges — how to balance bureaucracy and business agility.

So Rob and I thought it would be fun to tackle this dilemma in a Star Wars themed debate across our individual blog platforms with Rob taking the position for Bureaucracy as the Empire and me taking the opposing position for Agility as the Rebellion.

(Yes, the cliché is true, conversations between self-proclaimed data geeks tend to result in Star Wars or Star Trek parallels.)

Disclaimer: Remember that this is a true debate format where Rob and I are intentionally arguing polar opposite positions with full knowledge that the reality is data governance success requires effectively balancing bureaucracy and agility.

Please take the time to read both of our blog posts, then we encourage your comments — and your votes (see the poll below).

Data Governance Star Wars

If you are having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Data Governance Star Wars

The Force is Too Strong with This One

“Don’t give in to Bureaucracy—that is the path to the Dark Side of Data Governance.”

Data governance requires the coordination of a complex combination of a myriad of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data quality remediation, data stewardship, business process optimization, technology enablement, and, perhaps most notably, policy enforcement.

When confronted by this phantom menace of complexity, many organizations believe that the only path to success must be command and control—institute a rigid bureaucracy to dictate policies, demand compliance, and dole out punishments. This approach to data governance often makes policy compliance feel like imperial rule, and policy enforcement feel like martial law.

But beware. Bureaucracy, command, control—the Dark Side of Data Governance are they. Once you start down the dark path, forever will it dominate your destiny, consume your organization it will.

No Time to Discuss this as a Committee

“There is a great disturbance in the Data, as if millions of voices suddenly cried out for Governance but were suddenly silenced. I fear something terrible has happened. I fear another organization has started by creating a Data Governance Committee.”

Yes, it’s true—at some point, an official Data Governance Committee (or Council, or Board, or Galactic Senate) will be necessary.

However, one of the surest ways to guarantee the failure of a new data governance program is to start by creating a committee. This is often done with the best of intentions, bringing together key stakeholders from all around the organization, representatives of each business unit and business function, as well as data and technology stakeholders. But when you start by discussing data governance as a committee, you often never get data governance out of the committee (i.e., all talk, mostly arguing, no action).

Successful data governance programs often start with a small band of rebels (aka change agents) struggling to restore quality to some business-critical data, or struggling to resolve inefficiencies in a key business process. Once news of their successful pilot project spreads, more change agents will rally to the cause—because that’s what data governance truly requires, not a committee, but a cause to believe in and fight for—especially after the Empire of Bureaucracy strikes back and tries to put down the rebellion.

Collaboration is the Data Governance Force

“Collaboration is what gives a data governance program its power. Its energy binds us together. Cooperative beings are we. You must feel the Collaboration all around you, among the people, the data, the business process, the technology, everywhere.”

Many rightfully lament the misleading term “data governance” because it appears to put the emphasis on “governing data.”

Data governance actually governs the interactions among business processes, data, technology and, most important—people. It is the organization’s people, empowered by high quality data and enabled by technology, who optimize business processes for superior corporate performance. Data governance reveals how truly interconnected and interdependent the organization is, showing how everything that happens within the enterprise happens as a result of the interactions occurring among its people.

Data governance provides the framework for the communication and collaboration of business, data, and technical stakeholders, and establishes an enterprise-wide understanding of the roles and responsibilities involved, and the accountability required to support the organization’s business activities, and materialize the value of the enterprise’s data as positive business impacts.

Enforcing data governance policies with command and control is the quick and easy path—to failure. Principles, not policies, are what truly give a data governance program its power. Communication and collaboration are the two most powerful principles.

“May the Collaboration be with your Data Governance program. Always.”

Always in Motion is the Future

“Be mindful of the future, but not at the expense of the moment. Keep your concentration here and now, where it belongs.”

Perhaps the strongest case against bureaucracy in data governance is the business agility that is necessary for an organization to survive and thrive in today’s highly competitive and rapidly evolving marketplace. The organization must follow what works for as long as it works, but without being afraid to adjust as necessary when circumstances inevitably change.

Change is the only galactic constant, which is why data governance policies can never be cast in stone (or frozen in carbonite).

Will a well-implemented data governance strategy continue to be successful? Difficult to see. Always in motion is the future. And this is why, when it comes to deliberately designing a data governance program for agility: “Do or do not. There is no try.”

Click here to read Rob “Darth” Karel’s blog post entry in this data governance debate

Please feel free to also post a comment below and explain your vote or simply share your opinions and experiences.

Listen to Data Governance Star Wars on OCDQ Radio — In Part 1, Rob Karel and I discuss our blog mock debate, which is followed by a brief Star Wars themed intermission, and then in Part 2, Gwen Thomas joins us to provide her excellent insights.

January 17, 2011

Occurred, a data defect has . . .

January 17, 2011/ Jim Harris

Inspired by: The 404 Error Page of Adham Dannaway

October 12, 2010

Darth Data

October 12, 2010/ Jim Harris

Darth Tater

While I was grocery shopping today, I couldn’t resist taking this picture of Darth Tater.

As the Amazon product review explains: “Be it a long time ago, in a galaxy far, far away or right here at home in the 21st century, Mr. Potato Head never fails to reinvent himself.”

I couldn’t help but think of how although data’s quality is determined by evaluating its fitness for the purpose of business use, most data has multiple business uses, and data of sufficient quality for one use may not be for other, perhaps unintended, business uses.

It is this “Reinventing data for mix and match business fun!” that often provides the context for what, in hindsight, appear to be obvious data quality issues.

It makes me wonder if it’s possible to turn high quality data to the dark side of the Force by misusing it for a business purpose for which it has no applicability, resulting in bad, albeit data-driven, business decisions.

Please post a comment and let me know if you think it is possible to turn Data-kin Quality-walker into Darth Data.

May the Data Quality be with you, always.

June 05, 2010

The Point of View Paradox

June 05, 2010/ Jim Harris

One of my all-time favorite non-fiction books is The 7 Habits of Highly Effective People by Stephen Covey.

One of the book’s key points is that we need to carefully examine our point of view, the way we “see” the world—not in terms of our sense of sight, but instead in terms of the way we perceive, interpret, and ultimately understand the world around us.

As Covey explains early in the book, our point of view can be divided into two main categories, the ways things are (realities) and the ways things should be (values). We interpret our experiences from these two perspectives, rarely questioning their accuracy.

In other words, we simply assume that the way we see things is the way they really are or the way they should be. Our attitudes and behaviors are based on these assumptions. Therefore, our point of view influences the way we think and the way we act.

A famous experiment that Covey shares in the book, which he first encountered at the Harvard Business School, is intended to demonstrate how two people can see the same thing, disagree—and yet both be right. Although not logical, it is psychological.

This experiment is reproduced below using the illustrations that I scanned from the book. Please scroll down slowly.

Illustrations of a Young Woman

Look closely at the following illustrations, focusing first on the one on the left—and then slowly shift over to the one on the right:

Can you see the young woman with the petite nose, wearing a necklace, and looking away from you in both illustrations?

Illustrations of an Old Lady

Look closely at the following illustrations, focusing first on the one on the left—and then slowly shift over to the one on the right:

Can you see the old lady with the large nose, sad smile, and looking down in both illustrations?

Illustrations of a Paradox

Look closely at the following illustrations, focusing first on the one on the far left—and then on the one in the middle—and then shift your focus to the one on the far right—and then back to the one in the middle:

Can you now see both the young lady and the old woman in the middle illustration?

The Point of View Paradox

The above experiment is usually performed without using the secondary illustration (the one shown on the right of the first two and in the middle of the final one). Typically in a classroom setting, half of the room has their perception “seeded” utilizing the illustration of the young woman, and the other half with the illustration of the old lady. When the secondary illustration is then revealed to the entire classroom, arguments commence over whether a young woman or an old lady is being represented.

This experiment demonstrates how our point of view powerfully conditions us and affects the way we interact with other people.

In the world of data quality and its related disciplines, the point of view paradox often negatively impacts the communication and collaboration necessary for success.

Business and technical perspectives often appear diametrically opposed. Objective and subjective definitions of data quality seemingly contradict one another. And of course, the deeply polarized camps contrasting the reactive and proactive approaches to data quality often can’t even agree to disagree.

However, as Data Quality Expert and Jedi Master Obi-Wan Kenobi taught me a long time ago:

“You’re going to find that many of the truths we cling to depend greatly on our own point of view.”

Podcast: Business Technology and Human-Speak

The Third Law of Data Quality

Beyond a “Single Version of the Truth”

Poor Data Quality is a Virus

Hailing Frequencies Open

Hyperactive Data Quality (Second Edition)

Not So Strange Case of Dr. Technology and Mr. Business

You're So Vain, You Probably Think Data Quality Is About You

Follow OCDQ

If you enjoyed this blog post, then please subscribe to OCDQ via my RSS feed, my E-mail updates, or Google Reader.

You can also follow OCDQ on Twitter, fan the Facebook page for OCDQ, and connect with me on LinkedIn.

March 17, 2010

Wordless Wednesday: March 17, 2010

March 17, 2010/ Jim Harris

Photo via Flickr (Creative Commons License) by: Stéfan

November 12, 2009

Beyond a “Single Version of the Truth”

November 12, 2009/ Jim Harris

This post is involved in a good-natured contest (i.e., a blog-bout) with two additional bloggers: Henrik Liliendahl Sørensen and Charles Blyth. Our contest is a Blogging Olympics of sorts, with the United States, Denmark, and England competing for the Gold, Silver, and Bronze medals in an event we are calling “Three Single Versions of a Shared Version of the Truth.”

Please take the time to read all three posts and then vote for who you think has won the debate (see poll below). Thanks!

The “Point of View” Paradox

In the early 20th century, within his Special Theory of Relativity, Albert Einstein introduced the concept that space and time are interrelated entities forming a single continuum, and therefore the passage of time can be a variable that could change for each individual observer.

One of the many brilliant insights of special relativity was that it could explain why different observers can make validly different observations – it was a scientifically justifiable matter of perspective.

It was Einstein's apprentice, Obi-Wan Kenobi (to whom Albert explained “Gravity will be with you, always”), who stated:

“You're going to find that many of the truths we cling to depend greatly on our own point of view.”

The Data-Information Continuum

In the early 21st century, within his popular blog post The Data-Information Continuum, Jim Harris introduced the concept that data and information are interrelated entities forming a single continuum, and that speaking of oneself in the third person is the path to the dark side.

I use the Dragnet definition for data – it is “just the facts” collected as an abstract description of the real-world entities that the enterprise does business with (e.g., customers, vendors, suppliers).

Although a common definition for data quality is fitness for the purpose of use, the common challenge is that data has multiple uses – each with its own fitness requirements. Viewing each intended use as the information that is derived from data, I define information as data in use or data in action.

Quality within the Data-Information Continuum has both objective and subjective dimensions. Data's quality is objectively measured separate from its many uses, while information's quality is subjectively measured according to its specific use.

Objective Data Quality

Data quality standards provide a highest common denominator to be used by all business units throughout the enterprise as an objective data foundation for their operational, tactical, and strategic initiatives.

In order to lay this foundation, raw data is extracted directly from its sources, profiled, analyzed, transformed, cleansed, documented and monitored by data quality processes designed to provide and maintain universal data sources for the enterprise's information needs.

At this phase of the architecture, the manipulations of raw data must be limited to objective standards and not be customized for any subjective use. From this perspective, data is now fit to serve (as at least the basis for) each and every purpose.

Subjective Information Quality

Information quality standards (starting from the objective data foundation) are customized to meet the subjective needs of each business unit and initiative. This approach leverages a consistent enterprise understanding of data while also providing the information necessary for day-to-day operations.

But please understand: customization should not be performed simply for the sake of it. You must always define your information quality standards by using the enterprise-wide data quality standards as your initial framework.

Whenever possible, enterprise-wide standards should be enforced without customization. The key word within the phrase “subjective information quality standards” is standards — as opposed to subjective, which can quite often be misinterpreted as “you can do whatever you want.” Yes you can – just as long as you have justifiable business reasons for doing so.

This approach to implementing information quality standards has three primary advantages. First, it reinforces a consistent understanding and usage of data throughout the enterprise. Second, it requires each business unit and initiative to clearly explain exactly how they are using data differently from the rest of your organization, and more important, justify why. Finally, all deviations from enterprise-wide data quality standards will be fully documented.

The “One Lie Strategy”

A common objection to separating quality standards into objective data quality and subjective information quality is the enterprise's significant interest in creating what is commonly referred to as a “Single Version of the Truth.”

However, in his excellent book Data Driven: Profiting from Your Most Important Business Asset, Thomas Redman explains:

“A fiendishly attractive concept is...'a single version of the truth'...the logic is compelling...unfortunately, there is no single version of the truth.

For all important data, there are...too many uses, too many viewpoints, and too much nuance for a single version to have any hope of success.

This does not imply malfeasance on anyone's part; it is simply a fact of life.

Getting everyone to work from a single version of the truth may be a noble goal, but it is better to call this the 'one lie strategy' than anything resembling truth.”

Beyond a “Single Version of the Truth”

In the classic 1985 film Mad Max Beyond Thunderdome, the title character arrives in Bartertown, ruled by the evil Auntie Entity, where people living in the post-apocalyptic Australian outback go to trade for food, water, weapons, and supplies. Auntie Entity forces Mad Max to fight her rival Master Blaster to the death within a gladiator-like arena known as Thunderdome, which is governed by one simple rule:

“Two men enter, one man leaves.”

I have always struggled with the concept of creating a “Single Version of the Truth.” I imagine all of the key stakeholders from throughout the enterprise arriving in Corporatetown, ruled by the Machiavellian CEO known only as Veritas, where all business units and initiatives must go to request funding, staffing, and continued employment. Veritas forces all of them to fight their Master Data Management rivals within a gladiator-like arena known as Meetingdome, which is governed by one simple rule:

“Many versions of the truth enter, a Single Version of the Truth leaves.”

For any attempted “version of the truth” to truly be successfully implemented within your organization, it must take into account both the objective and subjective dimensions of quality within the Data-Information Continuum.

Both aspects of this shared perspective of quality must be incorporated into a “Shared Version of the Truth” that enforces a consistent enterprise understanding of data, but that also provides the information necessary to support day-to-day operations.

The Data-Information Continuum is governed by one simple rule:

“All validly different points of view must be allowed to enter,

In order for an all encompassing Shared Version of the Truth to be achieved.”

You are the Judge

Please take the time to read all three posts and then vote for who you think has won the debate. A link to the same poll is provided on all three blogs. Therefore, wherever you choose to cast your vote, you will be able to view an accurate tally of the current totals.

The poll will remain open for one week, closing at midnight on November 19 so that the “medal ceremony” can be conducted via Twitter on Friday, November 20. Additionally, please share your thoughts and perspectives on this debate by posting a comment below. Your comment may be copied (with full attribution) into the comments section of all of the blogs involved in this debate.

Poor Data Quality is a Virus

The General Theory of Data Quality

The Data-Information Continuum

August 26, 2009

The Only Thing Necessary for Poor Data Quality

August 26, 2009/ Jim Harris

“Demonstrate projected defects and business impacts if the business fails to act,” explains Dylan Jones of Data Quality Pro in his recent and remarkable post How To Deliver A Compelling Data Quality Business Case:

“Presenting a future without data quality management...leaves a simple take-away message – do nothing and the situation will deteriorate.”

I can not help but be reminded of the famous quote often attributed to the 18th century philosopher Edmund Burke:

“The only thing necessary for the triumph of evil, is for good men to do nothing.”

Or the even more famous quote often attributed to the long time ago Jedi Master Yoda:

“Poor data quality is the path to the dark side. Poor data quality leads to bad business decisions.

Bad business decisions leads to lost revenue. Lost revenue leads to suffering.”

When you present the business case for your data quality initiative to executive management and other corporate stakeholders, demonstrate that poor data quality is not a theoretical problem – it is a real business problem that negatively impacts the quality of decision-critical enterprise information.

Preventing poor data quality is mission-critical. Poor data quality will undermine the tactical and strategic initiatives essential to the enterprise's mission to survive and thrive in today's highly competitive and rapidly evolving marketplace.

“The only thing necessary for Poor Data Quality – is for good businesses to Do Nothing.”

Hyperactive Data Quality (Second Edition)

Data Quality: The Reality Show?

Data Governance and Data Quality

August 12, 2009

The General Theory of Data Quality

August 12, 2009/ Jim Harris

In one of the famous 1905 Annus Mirabilis Papers On the Electrodynamics of Moving Bodies, Albert Einstein published what would later become known as his Special Theory of Relativity.

This theory introduced the concept that space and time are interrelated entities forming a single continuum and that the passage of time can be a variable that could change for each specific observer.

As Einstein's Padawan Obi-Wan Kenobi would later explain in his remarkable 1983 “paper” on The Return of the Jedi:

“You're going to find that many of the truths we cling to depend greatly on our own point of view.”

Although the Special Theory of Relativity could explain the different perspectives of different observers, it could not explain the shared perspective of all observers. Special relativity ignored a foundational force in classical physics – gravity. So in 1916, Einstein used the force to incorporate a new perspective on gravity into what he called his General Theory of Relativity.

The Data-Information Continuum

In my popular post The Data-Information Continuum, I explained that data and information are also interrelated entities forming a single continuum. I used the Dragnet definition for data – it is “just the facts” collected as an abstract description of the real-world entities that the enterprise does business with (e.g. customers, vendors, suppliers).

I explained that although a common definition for data quality is fitness for the purpose of use, the common challenge is that data has multiple uses – each with its own fitness requirements. Viewing each intended use as the information that is derived from data, I defined information as data in use or data in action.

I went on to the explain that data's quality must be objectively measured separate from its many uses and that information's quality can only be subjectively measured according to its specific use.

The Special Theory of Data Quality

The majority of data quality initiatives are reactive projects launched in the aftermath of an event when poor data quality negatively impacted decision-critical information.

Many of these projects end in failure. Some fail because of lofty expectations or unmanaged scope creep. Most fail because they are based on the flawed perspective that data quality problems can be permanently “fixed” by a one-time project as opposed to needing a sustained program.

Whenever an organization approaches data quality as a one-time project and not as a sustained program, they are accepting what I refer to as the Special Theory of Data Quality.

However, similar to the accuracy of special relativity for solving a narrowly defined problem, sometimes applications of the Special Theory of Data Quality can yield successful results – from a certain point of view.

Tactical initiatives will often have a necessarily narrow focus. Reactive data quality projects are sometimes driven by a business triage for the most critical data problems requiring near-term prioritization that simply can't wait for the effects that would be caused by implementing a proactive strategic initiative (i.e. one that may have prevented the problems from happening).

One of the worst things that can happen to an organization is a successful data quality project – because it is almost always an implementation of information quality customized to the needs of the tactical initiative that provided its funding.

Ultimately, this misperceived success simply delays an actual failure when one of the following happens:

When the project is over, the team returns to their previous activities only to be forced into triage once again when the next inevitable crisis occurs where poor data quality negatively impacts decision-critical information.
When either a new project (or later phase of the same project) attempts to enforce the information quality standards throughout the organization as if they were enterprise data quality standards.

The General Theory of Data Quality

True data quality standards are enterprise-wide standards providing an objective data foundation. True information quality standards must always be customized to meet the subjective needs of a specific business process and/or initiative.

Both aspects of this shared perspective of quality must be incorporated into a single sustained program that enforces a consistent enterprise understanding of data, but that also provides the information necessary to support day-to-day operations.

Whenever an organization approaches data quality as a sustained program and not as a one-time project, they are accepting what I refer to as the General Theory of Data Quality.

Data governance provides the framework for crossing the special to general theoretical threshold necessary to evolve data quality from a project to a sustained program. However, in this post, I want to remain focused on which theory an organization accepts because if you don't accept the General Theory of Data Quality, you likely also don't accept the crucial role that data governance plays in a data quality initiative – and in all fairness, data governance obviously involves much more than just data quality.

Theory vs. Practice

Even though I am an advocate for the General Theory of Data Quality, I also realize that no one works at a company called Perfect, Incorporated. I would be lying if I said that I had not worked on more projects than programs, implemented more reactive data cleansing than proactive defect prevention, or that I have never championed a “single version of the truth.”

Therefore, my career has more often exemplified the Special Theory of Data Quality. Or perhaps my career has exemplified what could be referred to as the General Practice of Data Quality?

What theory of data quality does your organization accept? Which one do you personally accept?

More importantly, what does your organization actually practice when it comes to data quality?

The Data-Information Continuum

Hyperactive Data Quality (Second Edition)

Hyperactive Data Quality (First Edition)

Data Governance and Data Quality

Schrödinger's Data Quality

April 22, 2009

All I Really Need To Know About Data Quality I Learned In Kindergarten

April 22, 2009/ Jim Harris

Robert Fulghum's excellent book All I Really Need to Know I Learned in Kindergarten dominated the New York Times Bestseller List for all of 1989 and much of 1990. The 15th Anniversary Edition, which was published in 2003, revised and expanded on the original inspirational essays.

A far less noteworthy achievement of the book is that it also inspired me to write about how:

All I Really Need To Know About Data Quality I Learned in Kindergarten

Show And Tell

I loved show and tell. An opportunity to deliver an interactive presentation that encouraged audience participation. No PowerPoint slides. No podium. No power suit. Just me wearing the dorky clothes my parents bought me, standing right in front of the class, waving my Millennium Falcon over my head and explaining that "traveling through hyper-space ain't like dustin' crops, boy" while my classmates (and my teacher) were laughing so hard many of them fell out of their seats. My show and tell made it clear that if you came over my house after school to play, then you knew exactly what to expect - a geek who loved Star Wars - perhaps a little too much.

When you present the business case for your data quality initiative to executive management and other corporate stakeholders, remember the lessons of show and tell. Poor data quality is not a theoretical problem - it is a real business problem that negatively impacts the quality of decision critical enterprise information. Your presentation should make it clear that if the data quality initiative doesn't get approved, then everyone will know exactly what to expect:

"Poor data quality is the path to the dark side.

Poor data quality leads to bad business decisions.

Bad business decisions leads to lost revenue.

Lost revenue leads to suffering."

The Five Second Rule

If you drop your snack on the floor, then as long as you pick it up within five seconds you can safely eat it. When you have poor quality data in your enterprise systems, you do have more than five seconds to do something about it. However, the longer poor quality data goes without remediation, the more likely it will negatively impact critical business decisions. Don't let your data become the "smelly kid" in class. No one likes to share their snacks with the smelly kid. And no one trusts information derived from "smelly data."

When You Make A Mistake, Say You're Sorry

Nobody's perfect. We all have bad days. We all occasionally say and do stupid things. When you make a mistake, own up to it and apologize for it. You don't want to have to wear the dunce cap or stand in the corner for a time-out. And don't be too hard on your friend that had to wear the dunce cap today. It was simply their turn to make a mistake. It will probably be your turn tomorrow. They had to say they were sorry. You also have to forgive them. Who else is going to share their cookies with you when your mom once again packs carrots as your snack?

Learn Something New Every Day

We didn't stop learning after we "graduated" from kindergarten, did we? We are all proud of our education, knowledge, understanding, and experience. It may be true that experience is the path that separates knowledge from wisdom. However, we must remain open to learning new things. Socrates taught us that "the only true wisdom consists in knowing that you know nothing." I bet Socrates headlined the story time circuit in the kindergartens of Ancient Greece.

Hold Hands And Stick Together

I remember going on numerous field trips in kindergarten. We would visit museums, zoos and amusement parks. Wherever we went, our teacher would always have us form an interconnected group by holding the hand of the person in front of you and the person behind you. We were told to stick together and look out for one another. This important lesson is also applicable to data quality initiatives. Teamwork and collaboration are essential for success. Remember that you are all in this together.

What did you learn about data quality in kindergarten?

OCDQ Blog

Presentation Quality

The Dark Side of Data Cleansing

Has your Data Quality turned to the Dark Side?

Related Posts

Popular OCDQ Radio Episodes

Data Governance Star Wars

The Force is Too Strong with This One

No Time to Discuss this as a Committee

Collaboration is the Data Governance Force

Always in Motion is the Future

Illustrations of a Young Woman

Illustrations of an Old Lady

Illustrations of a Paradox

The Point of View Paradox

Related Posts

Follow OCDQ

The “Point of View” Paradox

The Data-Information Continuum

Objective Data Quality

Subjective Information Quality

The “One Lie Strategy”

Beyond a “Single Version of the Truth”

You are the Judge

Related Posts

Related Posts

The Data-Information Continuum

The Special Theory of Data Quality

The General Theory of Data Quality

Theory vs. Practice

Related Posts

Show And Tell

The Five Second Rule

When You Make A Mistake, Say You're Sorry

Learn Something New Every Day

Hold Hands And Stick Together

OCDQ Blog