The Data Quality Wager
Jim Harris in
Books,
Data Quality,
Debates tagged
Best of 2011,
Business Benefits,
ROI
Tuesday, April 12, 2011 at 3:00AM Gordon Hamilton recently emailed me with an excellent recommended topic for a data quality blog post:
“It always seems crazy to me that few executives base their ‘corporate wagers’ on the statistical research touted by data quality authors such as Tom Redman, Jack Olson and Larry English that shows that 15-45% of the operating expense of virtually all organizations is WASTED due to data quality issues.
So, if every organization is leaving 15-45% on the table each year, why don’t they do something about it? Philip Crosby says that quality is free, so why do the executives allow the waste to go on and on and on? It seems that if the shareholders actually think about the Data Quality Wager they might wonder why their executives are wasting their shares’ value. A large portion of that 15-45% could all go to the bottom line without a capital investment.
I’m maybe sounding a little vitriolic because I’ve been re-reading Deming’s Out of the Crisis and he has a low regard for North American industry because they won’t move beyond their short-term goals to build a quality organization, let alone implement Deming’s 14 principles or Larry English’s paraphrasing of them in a data quality context.”
The Data Quality Wager
Gordon Hamilton explained in his email that his reference to the Data Quality Wager was an allusion to Pascal’s Wager, but what follows is my rendering of it in a data quality context (i.e., if you don’t like what follows, please yell at me, not Gordon).
Although I agree with Gordon, I also acknowledge that convincing your organization to invest in data quality initiatives can be a hard sell. A common mistake is not framing the investment in data quality initiatives using business language such as mitigated risks, reduced costs, or increased revenue. I also acknowledge the reality of the fiscal calendar effect and how most initiatives increase short-term costs based on the long-term potential of eventually mitigating risks, reducing costs, or increasing revenue.
Short-term increased costs of a data quality initiative can include the purchase of data quality software and its maintenance fees, as well as the professional services needed for training and consulting for installation, configuration, application development, testing, and production implementation. And there are often additional short-term increased costs, both external and internal.
Please note that I am talking about the costs of proactively investing in a data quality initiative before any data quality issues have manifested that would prompt reactively investing in a data cleansing project. Although, either way, the short-term increased costs are the same, I am simply acknowledging the reality that it is always easier for a reactive project to get funding than it is for a proactive program to get funding—and this is obviously not only true for data quality initiatives.
Therefore, the organization has to evaluate the possible outcomes of proactively investing in data quality initiatives while also considering the possible existence of data quality issues (i.e., the existence of tangible business-impacting data quality issues):
- Invest in data quality initiatives + Data quality issues exist = Decreased risks and (eventually) decreased costs
- Invest in data quality initiatives + Data quality issues do not exist = Only increased costs — No ROI
- Do not invest in data quality initiatives + Data quality issues exist = Increased risks and (eventually) increased costs
- Do not invest in data quality initiatives + Data quality issues do not exist = No increased costs and no increased risks
Data quality professionals, vendors, and industry analysts all strongly advocate #1 — and all strongly criticize #3. (Additionally, since we believe data quality issues exist, most “orthodox” data quality folks generally refuse to even acknowledge #2 and #4.)
Unfortunately, when advocating #1, we often don’t effectively sell the business benefits of data quality, and when criticizing #3, we often focus too much on the negative aspects of not investing in data quality.
Only #4 “guarantees” neither increased costs nor increased risks by gambling on not investing in data quality initiatives based on the belief that data quality issues do not exist—and, by default, this is how many organizations make the Data Quality Wager.
How is your organization making the Data Quality Wager?
Related Posts
The Only Thing Necessary for Poor Data Quality
“Some is not a number and soon is not a time”
Which came first, the Data Quality Tool or the Business Need?
Selling the Business Benefits of Data Quality
Can Enterprise-Class Solutions Ever Deliver ROI?
The Five Worst Elevator Pitches for Data Quality
The Dumb and Dumber Guide to Data Quality
DQ-Tip: “Undisputable fact about the value and use of data…”



Reader Comments (4)
Erm ... so why exactly is it US who should be convincing our organizations to invest in data quality initiatives? Why should this be always a push from below? The executives are (over)paid to run their organizations, and ignoring data quality issues shows they are not running them properly.
Me? I have no patience with the jackanapes who are incapable of doing what they're paid for and will walk away rather than spend my energy persuading them to get on with it.
Right? Or am I just an old hippie who expects better?
Very nicely written, Jim. I love the matrix.
Great write up Jim. And kudos to the hip Graham for the jackanapes reference. :)
Deming puts a lot of energy into his arguments in Out of the Crisis that the short-term mindset of the executives, and by extension the directors, is a large part of the problem. Jackanapes, a lovely under-used term, might be a bit strong when the executives are really just doing what they are paid for. In North America we get what the directors measure!
In fact, one quandary is that a proactive executive, who invests in data quality is building the long-term value of their company but is also setting it up to be acquired by somebody who recognizes that the "under the radar" improvements are making the prize valuable. Deming says on p.100 "Fear of unfriendly takeover may be the single most important obstacle to constancy of purpose. There is also, besides the unfriendly takeover, the equally devastating leveraged buyout. Either way, the conqueror demands dividends, with vicious consequences on the vanquished."
As always, thanks everyone for contributing your commendable comments.
@Graham — I agree that it can be frustrating to have to convince our organizations to invest in data quality initiatives when this business need should obvious to the CjOs — instead of using CxOs as a generic reference to C-level executives, where x can stand for (E)xecutive, (I)nformation, (T)echnology, etc., I propose CjOs where j stands for Jackanape :-)
@William — Thanks, I borrowed the decision matrix concept from Pascal’s Wager :-)
@Gordon — Thanks again for recommending the topic for this blog post, and for contributing more Deming insights :-)
From the LinkedIn Group for the IAIDQ, Richard Ordowich commented:
“Let’s begin with the statement: that shows that 15-45% of the operating expense of virtually all organizations is WASTED due to data quality issues. I think the wager should be on whether this statement is conjecture or fact. We see these kinds of statistics regurgitated by most articles referring to data quality. Are these statistics defensible?
The references to manufacturing quality are themselves flawed. Except for the theoretical aspects the practical considerations that data are not at all similar to products. The ability to measure and monitor data quality are significantly different from that for measuring product quality. As a result the capabilities and impacts of data quality are limited.
The reason companies do not spend money on data quality is that their experiences indicate that the costs and impacts (if they can be measured) of errors are significantly less than the costs to establish a comprehensive data quality program. A comprehensive data quality program requires changes in behavior across the organization, changes to database design practices and SDLC processes and the establishment of metadata management and semantic capabilities. Without these, data quality is a “fix it” shop. The cost of data quality tools pales in comparison to these recurring costs.
Another reason that they do no invest in data quality is that the savings are fleeting. Following the initial discovery of instances of data anomalies, the nature of the errors becomes esoteric. Data quality is a reactive practice. Current data quality practices have little in the way of predictive capabilities.
I have yet to see a defensible ROI for data quality except after the fact; after the errors occurred. No one has presented a case that shows that data quality can predict and prevent errors. I don’t think this is a wager organizations are making. I think these organizations don’t accept the propaganda as fact.”
And I responded:
I definitely agree that manufacturing quality and data quality are very different disciplines, and although there is much to be learned from the studying the theories behind manufacturing quality, brute forcing the practical applications of those theories onto data quality is fundamentally flawed.
However, I do not agree with labeling data quality as a reactive practice.
Even though it is impossible to truly prevent every problem before it happens, proactive defect prevention is a highly recommended best practice because the more control enforced where data originates, the better the overall quality will be for enterprise data.
However, when poor data quality negatively impacts business performance, organizations legitimately prioritize a reactive short-term response, where the only remediation will be fixing the immediate problems (i.e., data cleansing).
Balancing the demands of this data triage mentality with the best practice of implementing defect prevention wherever possible is where theory and practice must merge by combining proactive defect prevention and reactive data cleansing into a hybrid data quality discipline.
P.S. This comment started discussion/debate about proactive/reactive data quality in the LinkedIn Group for the IAIDQ