December 05, 2011

There is No Such Thing as a Root Cause

December 05, 2011/ Jim Harris

Root cause analysis. Most people within the industry, myself included, often discuss the importance of determining the root cause of data governance and data quality issues. However, the complex cause and effect relationships underlying an issue means that when an issue is encountered, often you are only seeing one of the numerous effects of its root cause (or causes).

In my post The Root! The Root! The Root Cause is on Fire!, I poked fun at those resistant to root cause analysis with the lyrics:

The Root! The Root! The Root Cause is on Fire!
We don’t want to determine why, just let the Root Cause burn.
Burn, Root Cause, Burn!

However, I think that the time is long overdue for even me to admit the truth — There is No Such Thing as a Root Cause.

Before you charge at me with torches and pitchforks for having an Abby Normal brain, please allow me to explain.

Defect Prevention, Mouse Traps, and Spam Filters

Some advocates of defect prevention claim that zero defects is not only a useful motivation, but also an attainable goal. In my post The Asymptote of Data Quality, I quoted Daniel Pink’s book Drive: The Surprising Truth About What Motivates Us:

“Mastery is an asymptote. You can approach it. You can home in on it. You can get really, really, really close to it. But you can never touch it. Mastery is impossible to realize fully.

The mastery asymptote is a source of frustration. Why reach for something you can never fully attain?

But it’s also a source of allure. Why not reach for it? The joy is in the pursuit more than the realization.

In the end, mastery attracts precisely because mastery eludes.”

The mastery of defect prevention is sometimes distorted into a belief in data perfection, into a belief that we can not just build a better mousetrap, but we can build a mousetrap that could catch all the mice, or that by placing a mousetrap in our garage, which prevents mice from entering via the garage, we somehow also prevent mice from finding another way into our house.

Obviously, we can’t catch all the mice. However, that doesn’t mean we should let the mice be like Pinky and the Brain:

Pinky: “Gee, Brain, what do you want to do tonight?”

The Brain: “The same thing we do every night, Pinky — Try to take over the world!”

My point is that defect prevention is not the same thing as defect elimination. Defects evolve. An excellent example of this is spam. Even conservative estimates indicate almost 80% of all e-mail sent world-wide is spam. A similar percentage of blog comments are spam, and spam generating bots are quite prevalent on Twitter and other micro-blogging and social networking services. The inconvenient truth is that as we build better and better spam filters, spammers create better and better spam.

Just as mousetraps don’t eliminate mice and spam filters don’t eliminate spam, defect prevention doesn’t eliminate defects.

However, mousetraps, spam filters, and defect prevention are essential proactive best practices.

There are No Lines of Causation — Only Loops of Correlation

There are no root causes, only strong correlations. And correlations are strengthened by continuous monitoring. Believing there are root causes means believing continuous monitoring, and by extension, continuous improvement, has an end point. I call this the defect elimination fallacy, which I parodied in song in my post Imagining the Future of Data Quality.

Knowing there are only strong correlations means knowing continuous improvement is an infinite feedback loop. A practical example of this reality comes from data-driven decision making, where:

Better Business Performance is often correlated with
Better Decisions, which, in turn, are often correlated with
Better Data, which is precisely why Better Decisions with Better Data is foundational to Business Success — however . . .

This does not mean that we can draw straight lines of causation between (3) and (1), (3) and (2), or (2) and (1).

Despite our preference for simplicity over complexity, if bad data was the root cause of bad decisions and/or bad business performance, every organization would never be profitable, and if good data was the root cause of good decisions and/or good business performance, every organization could always be profitable. Even if good data was a root cause, not just a correlation, and even when data perfection is temporarily achieved, the effects would still be ephemeral because not only do defects evolve, but so does the business world. This evolution requires an endless revolution of continuous monitoring and improvement.

Many organizations implement data quality thresholds to close the feedback loop evaluating the effectiveness of their data management and data governance, but few implement decision quality thresholds to close the feedback loop evaluating the effectiveness of their data-driven decision making.

The quality of a decision is determined by the business results it produces, not the person who made the decision, the quality of the data used to support the decision, or even the decision-making technique. Of course, the reality is that business results are often not immediate and may sometimes be contingent upon the complex interplay of multiple decisions.

Even though evaluating decision quality only establishes a correlation, and not a causation, between the decision execution and its business results, it is still essential to continuously monitor data-driven decision making.

Although the business world will never be totally predictable, we can not turn a blind eye to the need for data-driven decision making best practices, or the reality that no best practice can eliminate the potential for poor data quality and decision quality, nor the potential for poor business results even despite better data quality and decision quality. Central to continuous improvement is the importance of closing the feedback loops that make data-driven decisions more transparent through better monitoring, allowing the organization to learn from its decision-making mistakes, and make adjustments when necessary.

We need to connect the dots of better business performance, better decisions, and better data by drawing loops of correlation.

Decision-Data Feedback Loop

Continuous improvement enables better decisions with better data, which drives better business performance — as long as you never stop looping the Decision-Data Feedback Loop, and start accepting that there is no such thing as a root cause.

I discuss this, and other aspects of data-driven decision making, in my DataFlux white paper, which is available for download (registration required) using the following link: Decision-Driven Data Management

The Root! The Root! The Root Cause is on Fire!

Bayesian Data-Driven Decision Making

The Role of Data Quality Monitoring in Data Governance

The Circle of Quality

Oughtn’t you audit?

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

Imagining the Future of Data Quality

What going to the Dentist taught me about Data Quality

DQ-Tip: “There is No Such Thing as Data Accuracy...”

The HedgeFoxian Hypothesis

December 01, 2011

Bayesian Data-Driven Decision Making

December 01, 2011/ Jim Harris

In his book Data Driven: Profiting from Your Most Important Business Asset, Thomas Redman recounts the story of economist John Maynard Keynes, who, when asked what he does when new data is presented that does not support his earlier decision, responded: “I change my opinion. What do you do?”

“This is the way good decision makers behave,” Redman explained. “They know that a newly made decision is but the first step in its execution. They regularly and systematically evaluate how well a decision is proving itself in practice by acquiring new data. They are not afraid to modify their decisions, even admitting they are wrong and reversing course if the facts demand it.”

Since he has a PhD in statistics, it’s not surprising that Redman explained effective data-driven decision making using Bayesian statistics, which is “an important branch of statistics that differs from classic statistics in the way it makes inferences based on data. One of its advantages is that it provides an explicit means to quantify uncertainty, both a priori, that is, in advance of the data, and a posteriori, in light of the data.”

Good decision makers, Redman explained, follow at least three Bayesian principles:

They bring as much of their prior experience as possible to bear in formulating their initial decision spaces and determining the sorts of data they will consider in making the decision.
For big, important decisions, they adopt decision criteria that minimize the maximum risk.
They constantly evaluate new data to determine how well a decision is working out, and they do not hesitate to modify the decision as needed.

A key concept of statistical process control and continuous improvement is the importance of closing the feedback loop that allows a process to monitor itself, learn from its mistakes, and adjust when necessary.

The importance of building feedback loops into data-driven decision making is too often ignored.

Decision-Driven Data Management

The Speed of Decision

The Big Data Collider

A Decision Needle in a Data Haystack

The Data-Decision Symphony

Thaler’s Apples and Data Quality Oranges

Satisficing Data Quality

Data Confabulation in Business Intelligence

The Data that Supported the Decision

Data Psychedelicatessen

OCDQ Radio - Big Data and Big Analytics

OCDQ Radio - Good-Enough Data for Fast-Enough Decisions

The Circle of Quality

A Farscape Analogy for Data Quality

OCDQ Radio - Organizing for Data Quality

July 13, 2010

The 2010 Data Quality Blogging All-Stars

July 13, 2010/ Jim Harris

The 2010 Major League Baseball (MLB) All-Star Game is being held tonight (July 13) at Angel Stadium in Anaheim, California.

For those readers who are not baseball fans, the All-Star Game is an annual exhibition held in mid-July that showcases the players with (for the most part) the best statistical performances during the first half of the MLB season.

Last summer, I began my own annual exhibition of showcasing the bloggers whose posts I have personally most enjoyed reading during the first half of the data quality blogging season.

Therefore, this post provides links to stellar data quality blog posts that were published between January 1 and June 30 of 2010. My definition of a “data quality blog post” also includes Data Governance, Master Data Management, and Business Intelligence.

Please Note: There is no implied ranking in the order that bloggers or blogs are listed, other than that Individual Blog All-Stars are listed first, followed by Vendor Blog All-Stars, and the blog posts are listed in reverse chronological order by publication date.

Henrik Liliendahl Sørensen

From Liliendahl on Data Quality:

Dylan Jones

From Data Quality Pro:

Julian Schwarzenbach

From Data and Process Advantage Blog:

Radioactive gold data
Does data make you lonely ?!?
The Data Accident Investigation Board
The Data Zoo (Five Part Series and White Paper): Part 1, Part 2, Part 3, Part 4, Part 5
How tasty is your data quality cheese?

Rich Murnane

From Rich Murnane's Blog:

Phil Wright

From Data Factotum:

How are you Executing your Data Quality Strategy?
How do you identify your strategic data?
The First Step on your Data Quality Roadmap
Can motivations impact the state of data quality?
A balanced approach to scoring data quality (Six Part Series): Part 1, Part 2, Part 3, Part 4, Part 5, Part 6
The Great Expectations of BI
Questions to measure BI DQ/DM success

Initiate – an IBM Company

From Mastering Data Management:

Baseline Consulting

From their three blogs: Inside the Biz with Jill Dyché, Inside IT with Evan Levy, and In the Field with our Experts:

DataFlux – a SAS Company

From Community of Experts:

Recently Read: May 15, 2010

Recently Read: March 22, 2010

Recently Read: March 6, 2010

Recently Read: January 23, 2010

The 2009 Data Quality Blogging All-Stars

Additional Resources

From the IAIDQ, read the 2010 issues of the Blog Carnival for Information/Data Quality:

April 11, 2009

Enterprise Data World 2009

April 11, 2009/ Jim Harris

Formerly known as the DAMA International Symposium and Wilshire MetaData Conference, Enterprise Data World 2009 was held April 5-9 in Tampa, Florida at the Tampa Convention Center.

Enterprise Data World is the business world’s most comprehensive vendor-neutral educational event about data and information management. This year’s program was bigger than ever before, with more sessions, more case studies, and more can’t-miss content. With 200 hours of in-depth tutorials, hands-on workshops, practical sessions and insightful keynotes, the conference was a tremendous success. Congratulations and thanks to Tony Shaw, Maya Stosskopf and the entire Wilshire staff.

I attended Enterprise Data World 2009 as a member of the Iowa Chapter of DAMA and as a Data Quality Journalist for the International Association for Information and Data Quality (IAIDQ).

I used Twitter to provide live reporting from the sessions that I was attending.

I wish that I could have attended every session, but here are some highlights from ten of my favorites:

8 Ways Data is Changing Everything

Keynote by Stephen Baker from BusinessWeek.

His article Math Will Rock Your World inspired his excellent book The Numerati. Additionally, check out his blog: Blogspotting.

Quotes from the keynote:

"Data is changing how we understand ourselves and how we understand our world"
"Predictive data mining is about the mathematical modeling of humanity"
"Anthropologists are looking at social networking (e.g. Twitter, Facebook) to understand the science of friendship"

Master Data Management: Proven Architectures, Products and Best Practices

Tutorial by David Loshin from Knowledge Integrity.

Included material from his excellent book Master Data Management. Additionally, check out his blog: David Loshin.

Quotes from the tutorial:

"Master Data are the core business objects used in the different applications across the organization, along with their associated metadata, attributes, definitions, roles, connections and taxonomies"
"Master Data Management (MDM) provides a unified view of core data subject areas (e.g. Customers, Products)"
"With MDM, it is important not to over-invest and under-implement - invest in and implement only what you need"

Master Data Management: Ignore the Hype and Keep the Focus on Data

Case Study by Tony Fisher from DataFlux and Jeff Grayson from Equinox Fitness.

Quotes from the case study:

"The most important thing about Master Data Management (MDM) is improving business processes"
"80% of any enterprise implementation should be the testing phase"
"MDM Data Quality (DQ) Challenge: Any % wrong means you’re 100% certain you’re not always right"
"MDM DQ Solution: Re-design applications to ensure the ‘front-door’ protects data quality"
"Technology is critical, however thinking through the operational processes is more important"

A Case of Usage: Working with Use Cases on Data-Centric Projects

Case Study by Susan Burk from IBM.

Quotes from the case study:

"Use Case is a sequence of actions performed to yield a result of observable business value"
"The primary focus of data-centric projects is data structure, data delivery and data quality"
"Don’t like use cases? – ok, call them business acceptance criteria – because that’s what a use case is"

Crowdsourcing: People are Smart, When Computers are Not

Session by Sharon Chiarella from Amazon Web Services.

Quotes from the session:

"Crowdsourcing is outsourcing a task typically performed by employees to a general community of people"
"Crowdsourcing eliminates over-staffing, lowers costs and reduces work turnaround time"
"An excellent example of crowdsourcing is open source software development (e.g. Linux)"

Improving Information Quality using Lean Six Sigma Methodology

Session by Atul Borkar and Guillermo Rueda from Intel.

Quotes from the session:

"Information Quality requires a structured methodology in order to be successful"
Lean Six Sigma Framework: DMAIC – Define, Measure, Analyze, Improve, Control:
- Define = Describe the challenge, goal, process and customer requirements
- Measure = Gather data about the challenge and the process
- Analyze = Use hypothesis and data to find root causes
- Improve = Develop, implement and refine solutions
- Control = Plan for stability and measurement

Universal Data Quality: The Key to Deriving Business Value from Corporate Data

Session by Stefanos Damianakis from Netrics.

Quotes from the session:

"The information stored in databases is NEVER perfect, consistent and complete – and it never can be!"
"Gartner reports that 25% of critical data within large businesses is somehow inaccurate or incomplete"
"Gartner reports that 50% of implementations fail due to lack of attention to data quality issues"
"A powerful approach to data matching is the mathematical modeling of human decision making"
"The greatest advantage of mathematical modeling is that there are no data matching rules to build and maintain"

Defining a Balanced Scorecard for Data Management

Seminar by C. Lwanga Yonke, a founding member of the International Association for Information and Data Quality (IAIDQ).

Quotes from the seminar:

"Entering the same data multiple times is like paying the same invoice multiple times"
"Good metrics help start conversations and turn strategy into action"
Good metrics have the following characteristics:
- Business Relevance
- Clarity of Definition
- Trending Capability (i.e. metric can be tracked over time)
- Easy to aggregate and roll-up to a summary
- Easy to drill-down to the details that comprised the measurement

Closing Panel: Data Management’s Next Big Thing!

Quotes from Panelist Peter Aiken from Data Blueprint:

Capability Maturity Levels:
1. Initial
2. Repeatable
3. Defined
4. Managed
5. Optimized
"Most companies are at a capability maturity level of (1) Initial or (2) Repeatable"
"Data should be treated as a durable asset"

Quotes from Panelist Noreen Kendle from Burton Group:

"A new age for data and data management is on horizon – a perfect storm is coming"
"The perfect storm is being caused by massive data growth and software as a service (i.e. cloud computing)"
"Always remember that you can make lemonade from lemons – the bad in life can be turned into something good"

Quotes from Panelist Karen Lopez from InfoAdvisors:

"If you keep using the same recipe, then you keep getting the same results"
"Our biggest problem is not technical in nature - we simply need to share our knowledge"
"Don’t be a dinosaur! Adopt a ‘go with what is’ philosophy and embrace the future!"

Quotes from Panelist Eric Miller from Zepheira:

"Applications should not be ON The Web, but OF The Web"
"New Acronym: LED – Linked Enterprise Data"
"Semantic Web is the HTML of DATA"

Quotes from Panelist Daniel Moody from University of Twente:

"Unified Modeling Language (UML) was the last big thing in software engineering"
"The next big thing will be ArchiMate, which is a unified language for enterprise architecture modeling"

Mark Your Calendar

Enterprise Data World 2010 will take place in San Francisco, California at the Hilton San Francisco on March 14-18, 2010.

OCDQ Blog

OCDQ Blog

OCDQ Blog

OCDQ Blog

There is No Such Thing as a Root Cause

Defect Prevention, Mouse Traps, and Spam Filters

There are No Lines of Causation — Only Loops of Correlation

Decision-Data Feedback Loop

Related Posts

Bayesian Data-Driven Decision Making

Related Posts

The 2010 Data Quality Blogging All-Stars

Henrik Liliendahl Sørensen

Dylan Jones

Julian Schwarzenbach

Rich Murnane

Phil Wright

Initiate – an IBM Company

Baseline Consulting

DataFlux – a SAS Company

Related Posts

Additional Resources

8 Ways Data is Changing Everything

Master Data Management: Proven Architectures, Products and Best Practices

Master Data Management: Ignore the Hype and Keep the Focus on Data

A Case of Usage: Working with Use Cases on Data-Centric Projects

Crowdsourcing: People are Smart, When Computers are Not

Improving Information Quality using Lean Six Sigma Methodology

Universal Data Quality: The Key to Deriving Business Value from Corporate Data

Defining a Balanced Scorecard for Data Management

Closing Panel: Data Management’s Next Big Thing!

Mark Your Calendar

OCDQ Blog