There is No Such Thing as a Root Cause
Jim Harris in
Books,
Data Quality,
Debates,
Vendors tagged
Best of 2011,
Business Intelligence,
Data Governance,
DataFlux,
Philosophy
Monday, December 5, 2011 at 3:00AM Root cause analysis. Most people within the industry, myself included, often discuss the importance of determining the root cause of data governance and data quality issues. However, the complex cause and effect relationships underlying an issue means that when an issue is encountered, often you are only seeing one of the numerous effects of its root cause (or causes).
In my post The Root! The Root! The Root Cause is on Fire!, I poked fun at those resistant to root cause analysis with the lyrics:
The Root! The Root! The Root Cause is on Fire!
We don’t want to determine why, just let the Root Cause burn.
Burn, Root Cause, Burn!
However, I think that the time is long overdue for even me to admit the truth — There is No Such Thing as a Root Cause.
Before you charge at me with torches and pitchforks for having an Abby Normal brain, please allow me to explain.
Defect Prevention, Mouse Traps, and Spam Filters
Some advocates of defect prevention claim that zero defects is not only a useful motivation, but also an attainable goal. In my post The Asymptote of Data Quality, I quoted Daniel Pink’s book Drive: The Surprising Truth About What Motivates Us:
“Mastery is an asymptote. You can approach it. You can home in on it. You can get really, really, really close to it. But you can never touch it. Mastery is impossible to realize fully.
The mastery asymptote is a source of frustration. Why reach for something you can never fully attain?
But it’s also a source of allure. Why not reach for it? The joy is in the pursuit more than the realization.
In the end, mastery attracts precisely because mastery eludes.”
The mastery of defect prevention is sometimes distorted into a belief in data perfection, into a belief that we can not just build a better mousetrap, but we can build a mousetrap that could catch all the mice, or that by placing a mousetrap in our garage, which prevents mice from entering via the garage, we somehow also prevent mice from finding another way into our house.
Obviously, we can’t catch all the mice. However, that doesn’t mean we should let the mice be like Pinky and the Brain:
Pinky: “Gee, Brain, what do you want to do tonight?”
The Brain: “The same thing we do every night, Pinky — Try to take over the world!”
My point is that defect prevention is not the same thing as defect elimination. Defects evolve. An excellent example of this is spam. Even conservative estimates indicate almost 80% of all e-mail sent world-wide is spam. A similar percentage of blog comments are spam, and spam generating bots are quite prevalent on Twitter and other micro-blogging and social networking services. The inconvenient truth is that as we build better and better spam filters, spammers create better and better spam.
Just as mousetraps don’t eliminate mice and spam filters don’t eliminate spam, defect prevention doesn’t eliminate defects.
However, mousetraps, spam filters, and defect prevention are essential proactive best practices.
There are No Lines of Causation — Only Loops of Correlation
There are no root causes, only strong correlations. And correlations are strengthened by continuous monitoring. Believing there are root causes means believing continuous monitoring, and by extension, continuous improvement, has an end point. I call this the defect elimination fallacy, which I parodied in song in my post Imagining the Future of Data Quality.
Knowing there are only strong correlations means knowing continuous improvement is an infinite feedback loop. A practical example of this reality comes from data-driven decision making, where:
- Better Business Performance is often correlated with
- Better Decisions, which, in turn, are often correlated with
- Better Data, which is precisely why Better Decisions with Better Data is foundational to Business Success — however . . .
This does not mean that we can draw straight lines of causation between (3) and (1), (3) and (2), or (2) and (1).
Despite our preference for simplicity over complexity, if bad data was the root cause of bad decisions and/or bad business performance, every organization would never be profitable, and if good data was the root cause of good decisions and/or good business performance, every organization could always be profitable. Even if good data was a root cause, not just a correlation, and even when data perfection is temporarily achieved, the effects would still be ephemeral because not only do defects evolve, but so does the business world. This evolution requires an endless revolution of continuous monitoring and improvement.
Many organizations implement data quality thresholds to close the feedback loop evaluating the effectiveness of their data management and data governance, but few implement decision quality thresholds to close the feedback loop evaluating the effectiveness of their data-driven decision making.
The quality of a decision is determined by the business results it produces, not the person who made the decision, the quality of the data used to support the decision, or even the decision-making technique. Of course, the reality is that business results are often not immediate and may sometimes be contingent upon the complex interplay of multiple decisions.
Even though evaluating decision quality only establishes a correlation, and not a causation, between the decision execution and its business results, it is still essential to continuously monitor data-driven decision making.
Although the business world will never be totally predictable, we can not turn a blind eye to the need for data-driven decision making best practices, or the reality that no best practice can eliminate the potential for poor data quality and decision quality, nor the potential for poor business results even despite better data quality and decision quality. Central to continuous improvement is the importance of closing the feedback loops that make data-driven decisions more transparent through better monitoring, allowing the organization to learn from its decision-making mistakes, and make adjustments when necessary.
We need to connect the dots of better business performance, better decisions, and better data by drawing loops of correlation.
Decision-Data Feedback Loop

Continuous improvement enables better decisions with better data, which drives better business performance — as long as you never stop looping the Decision-Data Feedback Loop, and start accepting that there is no such thing as a root cause.
I discuss this, and other aspects of data-driven decision making, in my DataFlux white paper, which is available for download (registration required) using the following link: Decision-Driven Data Management
Related Posts
The Root! The Root! The Root Cause is on Fire!
Bayesian Data-Driven Decision Making
The Role of Data Quality Monitoring in Data Governance
The Dichotomy Paradox, Data Quality and Zero Defects
Imagining the Future of Data Quality
What going to the Dentist taught me about Data Quality



Reader Comments (3)
Thought-provoking post as always, Jim.
It is disquieting to think that an organization can never achieve perfect data but I wonder, does telling people this then cause people to stop rooting around for the causes of the imperfections? That is, do people see the message "You can never achieve Mastery" as "Don't even bother, you can't achieve it anyway" or is that just me?
I expect your message is received at different levels depending on the readers depth of understanding of the complexities of data quality. Redman and English might nod knowingly as they read your entry and think yes, of course continuous improvement means that this is a lifetime commitment.
Whereas folks typing away at a cold keyboard in the frozen north might think "Should I really try to find out whether the cause of our bloated inventory is partially caused by poor data quality, or should I let our super-efficient delivery system continue to overcome this problem on our annual Christmas Eve delivery frenzy?"
Another interesting aspect of this continuous improvement situation might be to view it through Goldratt’s Theory of Constraint and solve the data quality problems if they are, or are contributing to, the critical constraint of our organization.
If so, then paraphrasing Goldratt’s method, you concentrate on that critical constraint, correct the most important causes of that problem, sublimate efforts at fixing other constraints to this current effort and finally, when you have wrestled that constraint down to a less critical level, then like Pareto's 80/20 rule, the next most critical constraint becomes the most critical constraint. This method is similar to your discussion above, and possibly adds value by mindfully focussing on one constraint at a time, and correcting the contributing and correlated factors until the problem is no longer the most critical; and then moving on to the next.
Thanks again, Jim.
Cheers, Gordon
Thanks for your thought-provoking as always comment, Gordon.
To your first point, I would respectfully contend that telling people that data perfection is impossible is far less disabling than telling them that data perfection is their goal. One reason is the demoralizing effect that continued defects (i.e., data quality issues) can have when people are browbeaten to believe that they have to be perfect.
As I stated in my post, believing in data perfection distorts defect prevention into believing in the possibility of defect elimination. However disquieting the reality may be, the inconvenient truth is that data perfection is impossible — because, again as I stated in my post, even when data perfection is temporarily achieved, the effects would be ephemeral because not only do defects evolve, but so does the business world. This evolution requires an endless revolution of looping the feedback loops of continuous monitoring and continuous improvement.
I definitely agree with your excellent point about the Theory of Constraints, which I think flows well into the discussion that I had via Twitter with Ronald Damhof, who remarked that “thinking in root causes (mind the plural due to the non-linearity of cause and effect) leads to exploratory thinking, which is a good thing.”
After I agreed by saying that the non-linearity of causation (many interrelated causes and effects) is precisely why I advocate correlation loops, Ronald responded that “linear thinking is a nasty and destructive illness that is hard to recover from,” to which I agreed by adding that linear thinking reflects a preference for simplicity, but oversimplifying complexity leads to what I call Occam’s Razor Burn.
Please note, Gordon, that I am not suggesting that you are trying to oversimplify complexity. Your counterpoints are valid and vitally needed in this discussion. So, once again, thank you for sharing your insight.
Best Regards,
Jim
From the LinkedIn Group for Data Governance & Stewardship, Grant Sutton commented:
“Fine, but we can’t let the best be the enemy of the good. There is much to be gained by making things better by moving forward with sound analysis and not being paralyzed by a perfection or bust mentality.”
And John Adler commented:
“Philosophically I agree. At the end of the day you seem to be arguing for evolving approaches to data quality. Practically, most organizations would be happy to be able to address something approximating root cause and then moving on to isolating and controlling the next high value data quality item on their list.”
And I responded:
Thanks for your comments, Grant and John.
@Grant — I definitely agree that a perfection or bust mentality is counter-productive, if not entirely self-destructive, since it becomes demoralizing when it is inevitably realized that perfection is impossible. My point was that the endless feedback loops of continuous improvement make data-driven decisions more transparent through better monitoring, allowing the organization to learn from its decision-making mistakes, and make adjustments when necessary.
@John — Yes, I am definitely arguing in favor of evolution through the endless revolution of looping the feedback loops of continuous monitoring and continuous improvement. My key argument against causation is that it is non-linear in nature. Causation is actually a complex network of many interrelated causes and effects. In other words, some of what appear to be the effects of the root cause you have isolated may, in fact, be the effects of other causes, as well as if your root cause is not truly root, it could be an effect of one or more other causes. Correlation loops do not try to oversimplify this complexity, but instead acknowledge it and monitor it, so that the non-linear network of causation can be more effectively understood through repeated observations of the inner workings of data-driven decision making.
From the LinkedIn Group for Data Governance & Data Quality, Jeffrey Tyzzer commented:
“Your post is, as usual, thoughtful and informed. From the standpoint of the post hoc ergo prompter hoc fallacy, I can see where you're coming from WRT correlation and causality; I also appreciate the perils inherent in confusing symptoms with problems. But with due respect to you (and Rock Master Scott and The Dynamic Three), people are daily isolating and remedying root causes of defects. Granted 100% defect prevention is a fool's errand (as you say, chasing after those extra sigmas yields diminishing returns), but is there no end point not because there are no root causes but rather because new, different, causes keep creeping in?”
And I responded:
Excellent point about the distinction between no root causes and the constant creation of new root causes.
Resolving the root causes underlying data quality issues is kind of like an endless game of Whac-A-Mole. No matter how good you get at smacking down those moles, more keep popping up — sometimes in different places, but sometimes in the same places. A continuous improvement mindset is required in data quality, such that we realize even when we identify and eliminate a true root cause, the effects are ephemeral because there will always be more moles to whack (i.e., data quality issues to resolve).
We need to keep looping the feedback loops of continuous monitoring and continuous improvement.
From the LinkedIn Group for Data Quality Pro.com, Lisa Marie Martinez commented:
“Okay Jim, I’m struggling on this point. Surprised myself, I always like your articles.
Let’s try to see if what your saying about defects or poor data being about poor decisions. Not in my mind, you also wrote something to this effect. There are times when one person's bad data=another persons predictable information about changes that warrant improvement to prepare in advance. Changing the bad to good, these are symptoms not root causes. These are one of the reason's you have variance thresholds. To monitor the places in a process where the expectation has become a restart or do over go back and start again. Those are not logical flows; the expectation isn't bad data, rather stale workflows. Changes aren't always bad. How you execute or what comes first the chicken or the egg, is bad.
Changes in process without the performance measures included in the change, bad.
Changes in the measures without the process, worse.
I’m probably only getting a fraction of your point. These are my points of interest.”
And I responded:
Thanks, as always, for your comment, Lisa Marie.
First of all, I was not implying that bad data is caused by bad decisions, or vice versa. In fact, I don’t believe that either can be the cause for the other, even though they are sometimes correlated.
To your points of interest — if I understand them correctly — I am certainly advocating building more feedback loops into data-driven decision-making processes, so that we have a way to measure performance, not in some general sense, but in a decision-specific way. As you remarked, what might be bad data for one person’s decision might be great data for another person’s decision. Each decision needs to have its decision criteria and data requirements defined so that decision-specific quality thresholds can be implemented for more transparent performance measurement, enabling us to make, and monitor the results of, any changes.
And Lisa Marie Martinez responded:
“Okay, I’m with you again. You had me nervous for a moment. I’ve gotten rather comfortable knowing you’ll have some great article to support what I thought I knew. :)”
And I responded:
Thanks, Lisa Marie.
The reality is that comments from knowledgable people, such as yourself, help me better appreciate the difference between what I know and what I only think I know. So, I am very happy to hear that we are on the same page again :-)
And Martin Doyle commented:
“As a single stone causes concentric ripples in a pond. There will always be one root cause event creating the Data Quality wave. I agree there may interference after the root cause event which may look like a root cause, creating eddies of side effects and confusion.
I believe there will always be one root cause; work backwards from the Data Quality side effects to the root cause and the Data Quality ripples will be eliminated.”
And I responded:
Thanks, as always, for your comment, Martin.
I must respectfully disagree with you.
Let’s begin with the single stone causing the concentric ripples in a pond. Is the stone really the root cause? Who threw the stone? Why did that person choose to throw that particular stone? How did the stone come to be alongside the pond? Which path did the stone-thrower take to the pond? What happened to the stone-thrower earlier in the day that made them want to go to the pond, and once there, pick up a stone and throw it in the pond?
My point is that only simple systems have root causes, and we neither live nor work within simple systems.
Data cleansing can eliminate the unwanted current quality ripples in our data ponds, and defect prevention can minimize the probability (but not eliminate the possibility) of unwanted future quality ripples in our data ponds.
Believing you have found and eliminated the root causes of data quality problems is like believing you can stop the stone-throwers by removing all of the stones from area surrounding the pond, and by building a high stone-deflecting wall around the pond with sentries posted to vigilantly keep on the lookout for stones and stone-throwers.
Although you should take these precautions, the inconvenient truth is that the world will always have stones and stone-throwers who will be sufficiently motivated to find a way to throw a stone in your pond.
This doesn’t mean that we simply accept the chaos. On the contrary, we do battle with the chaos on a daily basis, but without ever deluding ourselves into thinking that the Stone Wars will ever end.
There are no root causes, only strong correlations. And correlations are strengthened by continuous monitoring. Data Quality is an Infinite Feedback Loop of Correlations and Continuous Improvements.
Best Regards,
Jim
And Martin Doyle responded:
“Jim, Love your deep thinking and I concur, however, I am referring to point the ripple starts in any system under our control, like a database or application. Not the genesis of the Cosmos.
Applying your well reasoned argument for the root of all root cause analysis, we'll be back to a creationist or evolutionist debate, which may be OK as we can lay the blame for all Data Quality problems on the Big Bang, God, or any other higher power your belief system might acknowledge.
Like any good DQ initiative, let's be pragmatic, progress to perfection is what we're after; we can only control what we can control. Perhaps if we agree that the entry point into our specific data eco-system is the root cause we should focus on, then we'll be on the same page?
Keep the good stuff coming, always though provoking.
Best Regards,
Martin”
And I responded:
Let those who have achieved a lasting data perfection cast the first stone . . .
(Sorry, I couldn't prevent my defective self from making that joke :-) )
Again, I must respectfully disagree with you.
“Let's be pragmatic, progress to perfection is what we're after. . .”
Yes, let's be pragmatic, progress toward better support of our organization's business activities is what we're after.
And if you want to counter that data perfection is necessary to achieve successful business activities, then I submit as evidence every organization in the entire history of the business world, who despite the fact that they neither possessed nor pursued data perfection still managed to attain various degrees and durations of business success.
Furthermore, since when is it logical to equate pragmatism with perfectionism?
I am reminded of the words of Inigo Montoya from the movie The Princess Bride:
“You keep using that word. I do not think it means what you think it means.”
Claiming, as so many within the data quality and information quality profession claim, that perfection is a motivational goal, is, in my opinion: “Inconceivable!”
Perfection is a de-motivational goal. It is like telling your kids that they must get an A+ (the highest possible grade in the most common grading system used in American schools) on every exam in every subject. Is this motivational or soul-crushing when they bring home their report card and it shows that they only got an A- or B+ in every subject.
Furthermore, telling kids that the goal of education is to be perfect in every subject is like telling business people that every decision has the same business impacts and business risks, which is obviously not true.
However, the decisions that have the greatest impact (i.e., business success correlated with, not caused by, making a good decision) and risk (i.e., business failure correlated with, not caused by, making a bad decision) should be supported with best possible data.
Saying that data perfection is the goal places the focus on data — and not on business needs.
As I concluded my poem To Our Data Perfectionists:
Before our organization’s limited money and time are devoured,
Let us make sure that our critical business decisions are empowered.
Let us also realize that since change is the only universal constant,
Real best practices are not cast in stone, but written on parchment.
Because the business uses for our data, as well as our business itself, continues to evolve,
Our data strategy must be adaptation, allowing our dynamic business problems to be solved.
Thus, although it is true that we can never achieve Data Perfection,
We can deliver Business Insight, which always is our true direction.
Please forgive me for using you as a stand-in and sounding board for my pet peeves about our industry. Your counterpoints are valid and vitally needed in this discussion. So, once again, thank you for sharing your insight.
Best Regards,
Jim
And Martin Doyle responded:
“I'm loving the banter Jim. However, I note you didn't comment on my (tongue in cheek) hypothesis that if we follow your arguments, the ultimate root cause of all DQ ails can be traced to the Creation of the Cosmos by "some God" or perhaps "The Big Bang"? Perhaps that’s because it relates to the past and is totally out of our control.
We can only control the now! So, going back to the stone metaphor, I agree it is wasted energy, time and cost to try and predict every preceding possibility. As a pragmatist though we can recognize when a stone is thrown, validate its size weight and composition. If correct and “Fit for use” albeit not necessarily “Perfect” we can allow it into the system.
In terms of your points, “progress toward better support of our organization's business activities is what we're after.” Whilst supporting business activities is good, it is tactical and the automation or repetition of the wrong activities is certainly not good for the business.
In DQ terms, we need to support and be aligned with the Strategic goals of the business. Only when the right activities are carried in the right way will businesses avoid toxic data polluting the insight they crave. Success comes from better data not better analysis.
I’m not an advocate of the false goal of perfection as a mantra; it’s no accident our company strapline is “making your data fit for business” rather than say “perfecting your data”!
I believe we are on the same page, perhaps we should debate offline the merits of DQ motivation, i.e. the strong away from Pain drivers to correct bad data as opposed to the weak towards motivation of creating, maintaining and using good data.
Thanks again for your insights. We share the same pet peeves and I look forward to your ongoing views and thought leadership.”
And I responded:
Continued thanks, Martin. I too am loving the banter, and I agree that we are much closer to agreement than our banter might otherwise appear to others.
To your point about my stone-throwing us back to an attempted discovery of the Aristotelian Prime Mover — I was actually poking fun at the overuse of the technique known as The Five Whys, where you continue to ask why (and more than five times, if necessary) in order to identify the fundamental (and often complex) cause and effect relationships underlying an issue.
It reminds of Deep Thought answering The Ultimate Question of Life, the Universe, and Everything (42, obviously), but without determining what the Ultimate Question itself was, which turns out to be “How many Whys must we ask before arriving at the root cause of a data quality problem?”
(My apologizes, Mr. Douglas Adams — Wherever you are now, Please Don’t Panic and Thanks for All the Fish :-) )
My key argument against causation is that it is non-linear in nature. Causation is actually a complex network of many interrelated causes and effects. In other words, some of what appear to be the effects of the root cause you have isolated may, in fact, be the effects of other causes, as well as if your root cause is not truly root, it could be an effect of one or more other causes. Correlation loops do not try to oversimplify this complexity, but instead acknowledge it and monitor it, so that the non-linear network of causation can be more effectively understood through repeated observations of the inner workings of data-driven decision making.
Instead of a Heart of Gold equipped with an Infinite Improbability Drive, we need a Heart of Continuous Improvement equipped with Infinite Correlation Loops.
Thanks again for sharing your truly deep thoughts.
Perhaps in early 2012, we could continue our banter on an episode of OCDQ Radio? (That was a shameless plug for my podcast show, which returns on December 13: http://www.ocdqblog.com/podcast )