The Stone Wars of Root Cause Analysis

Stone+Ripple.jpg

“As a single stone causes concentric ripples in a pond,” Martin Doyle commented on my blog post There is No Such Thing as a Root Cause, “there will always be one root cause event creating the data quality wave.  There may be interference after the root cause event which may look like a root cause, creating eddies of side effects and confusion, but I believe there will always be one root cause.  Work backwards from the data quality side effects to the root cause and the data quality ripples will be eliminated.”

Martin Doyle and I continued our congenial blog comment banter on my podcast episode The Johari Window of Data Quality, but in this blog post I wanted to focus on the stone-throwing metaphor for root cause analysis.

Let’s begin with the concept of a single stone causing the concentric ripples in a pond.  Is the stone really the root cause?  Who threw the stone?  Why did that particular person choose to throw that specific stone?  How did the stone come to be alongside the pond?  Which path did the stone-thrower take to get to the pond?  What happened to the stone-thrower earlier in the day that made them want to go to the pond, and once there, pick up a stone and throw it in the pond?

My point is that while root cause analysis is important to data quality improvement, too often we can get carried away riding the ripples of what we believe to be the root cause of poor data quality.  Adding to the complexity is the fact there’s hardly ever just one stone.  Many stones get thrown into our data ponds, and trying to un-ripple their poor quality effects can lead us to false conclusions because causation is non-linear in nature.  Causation is a complex network of many interrelated causes and effects, so some of what appear to be the effects of the root cause you have isolated may, in fact, be the effects of other causes.

As Laura Sebastian-Coleman explains, data quality assessments are often “a quest to find a single criminal—The Root Cause—rather than to understand the process that creates the data and the factors that contribute to data issues and discrepancies.”  Those approaching data quality this way, “start hunting for the one thing that will explain all the problems.  Their goal is to slay the root cause and live happily ever after.  Their intentions are good.  And slaying root causes—such as poor process design—can bring about improvement.  But many data problems are symptoms of a lack of knowledge about the data and the processes that create it.  You cannot slay a lack of knowledge.  The only way to solve a knowledge problem is to build knowledge of the data.”

Believing that you have found and eliminated the root cause of all your data quality problems is like believing that after you have removed the stones from your pond (i.e., data cleansing), you can stop the stone-throwers by building a high stone-deflecting wall around your pond (i.e., defect prevention).  However, there will always be stones (i.e., data quality issues) and there will always be stone-throwers (i.e., people and processes) that will find a way to throw a stone in your pond.

In our recent podcast Measuring Data Quality for Ongoing Improvement, Laura Sebastian-Coleman and I discussed although root cause is used as a singular noun, just as data is used as a singular noun, we should talk about root causes since, just as data analysis is not analysis of a single datum, root cause analysis should not be viewed as analysis of a single root cause.

The bottom line, or, if you prefer, the ripple at the bottom of the pond, is the Stone Wars of Root Cause Analysis will never end because data quality is a journey, not a destination.  After all, that’s why it’s called ongoing data quality improvement.

Measuring Data Quality for Ongoing Improvement

OCDQ Radio is an audio podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Listen to Laura Sebastian-Coleman, author of the book Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework, and I discuss bringing together a better understanding of what is represented in data, and how it is represented, with the expectations for use in order to improve the overall quality of data.  Our discussion also includes avoiding two common mistakes made when starting a data quality project, and defining five dimensions of data quality.

Laura Sebastian-Coleman has worked on data quality in large health care data warehouses since 2003.  She has implemented data quality metrics and reporting, launched and facilitated a data quality community, contributed to data consumer training programs, and has led efforts to establish data standards and to manage metadata.  In 2009, she led a group of analysts in developing the original Data Quality Assessment Framework (DQAF), which is the basis for her book.

Laura Sebastian-Coleman has delivered papers at MIT’s Information Quality Conferences and at conferences sponsored by the International Association for Information and Data Quality (IAIDQ) and the Data Governance Organization (DGO).  She holds IQCP (Information Quality Certified Professional) designation from IAIDQ, a Certificate in Information Quality from MIT, a B.A. in English and History from Franklin & Marshall College, and a Ph.D. in English Literature from the University of Rochester.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Data Profiling Early and Often — Guest James Standen discusses data profiling concepts and practices, and how bad data is often misunderstood and can be coaxed away from the dark side if you know how to approach it.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

The Symbiotic Relationship of Cloud and Mobile

“Although many people are calling for a cloud revolution in which everyone simultaneously migrates their systems to the cloud,” David Linthicum recently blogged, “that’s not going to happen.  While there will be no mass migration, there will be many one-off cloud migration projects that improve the functionality of systems, as well as cloud-based deployments of new systems.”

“This means that,” Linthicum predicted, “cloud computing’s growth will follow the same patterns of adoption we saw for the PC and the Web.  We won’t notice many of the changes as they occur, but the changes will indeed come.”  Perhaps the biggest driver of the cloud-based changes to come is the way many of us are using the cloud today — as a way to synchronize data across our multiple devices, the vast majority of which nowadays are mobile devices.

John Mason recently blogged about the symbiotic relationship between the cloud and mobile devices, which “not only expands the reach of small and midsize businesses, it levels the playing field too, helping them compete in a quickly changing business environment.  Cloud-based applications help businesses stay mobile, agile, and responsive without sacrificing security or reliability, and even the smallest of companies can provide their customers with fast, around-the-clock access to important data.”

The age of the mobile device is upon us and it is thanks mainly to the cloud-based applications floating above us, enabling a mobile-app-portal-to-the-cloud computing model that is well-supported by the widespread availability of high-speed network connectivity options since, no matter where we are, it seems like a Wi-Fi or broadband mobile network is always available.

As more and more small and midsize businesses continue to leverage the symbiotic relationship between the cloud and mobile to build relationships with customers and rethink how work works, they are enabling the future of the collaborative economy.

IBM Logo.jpg

Caffeinated Thoughts on Technology for Midsize Businesses

If you are having trouble viewing this video, then you can watch it on Vimeo via this link:vimeo.com/71338997

The following links are to the resources featured in or related to the content of this video:

  • Get Bold with Your Social Media: http://goo.gl/PCQ11 (Sandy Carter Book Review by Debbie Laskey)
IBM Logo.jpg

DQ-BE: The Time Traveling Gift Card

Data Quality By Example (DQ-BE) is an OCDQ regular segment that provides examples of data quality key concepts.

As an avid reader, I tend to redeem most of my American Express Membership Rewards points for Barnes & Noble gift cards to buy new books for my Nook.  As a data quality expert, I tend to notice when something is amiss with data.  As shown above, for example, my recent gift card was apparently issued on — and only available for use until — January 1, 1900.

At first, I thought I might have encountered the time traveling gift card.  However, I doubted the gift card would be accepted as legal tender in 1900.  Then I thought my gift card was actually worth $1,410 (what $50 in 1900 would be worth today), which would allow me to buy a lot more books — as long as Barnes & Noble would overlook the fact the gift card expired 113 years ago.

Fortunately, I was able to use the gift card to purchase $50 worth of books in 2013.

So, I guess the moral of this story is that sometimes poor data quality does pay.  However, it probably never pays to display your poor data quality to someone who runs an obsessive-compulsive data quality blog with a series about data quality by example.

 

What examples (good or poor) of data quality have you encountered in your time travels?

 

Related Posts

DQ-BE: Invitation to Duplication

DQ-BE: Dear Valued Customer

DQ-BE: Single Version of the Time

DQ-BE: Data Quality Airlines

Retroactive Data Quality

Sometimes Worse Data Quality is Better

Data Quality, 50023

DQ-IRL (Data Quality in Real Life)

The Seven Year Glitch

When Poor Data Quality Calls

Data has an Expiration Date

Sometimes it’s Okay to be Shallow

A Big Data Platform for Midsize Businesses

If you’re having trouble viewing this video, watch it on Vimeo via this link:A Big Data Platform for Midsize Businesses

The following links are to the infographics featured in this video, as well as links to other related resources:

  • Webcast Replay: Why Big Data Matters to the Midmarket: http://goo.gl/A1WYZ (No Registration Required)
  • IBM’s 2012 Big Data Study with Feedback from People who saw Results: http://goo.gl/MmRAv (Registration Required)
  • Participate in IBM’s 2013 Business Value Survey on Analytics and Big Data: http://goo.gl/zKSPM (Registration Required)
IBM Logo.jpg

Too Big to Ignore

OCDQ Radio is an audio podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, Phil Simon shares his sage advice for getting started with big data, including the importance of having a data-oriented mindset, that ambitious long-term goals should give way to more reasonable and attainable short-term objectives, and always remembering that big data is just another means toward solving business problems.

Phil Simon is a sought-after speaker and the author of five management books, most recently Too Big to Ignore: The Business Case for Big Data.  A recognized technology expert, he consults companies on how to optimize their use of technology.  His contributions have been featured on NBC, CNBC, ABC News, Inc. magazine, BusinessWeek, Huffington Post, Globe and Mail, Fast Company, Forbes, the New York Times, ReadWriteWeb, and many other sites.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Data Profiling Early and Often — Guest James Standen discusses data profiling concepts and practices, and how bad data is often misunderstood and can be coaxed away from the dark side if you know how to approach it.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Cloud Benefits for Midsize Businesses

If you’re having trouble viewing this video, watch it on Vimeo via this link:Cloud Benefits for Midsize Businesses on Vimeo

The following links are to the infographics and eBook featured in this video, as well as other related resources:

IBM Logo.jpg

DQ-Tip: “An information centric organization...”

Data Quality (DQ) Tips is an OCDQ regular segment.  Each DQ-Tip is a clear and concise data quality pearl of wisdom.

“An information centric organization is an organization driven from high-quality, complete, and timely information that is relevant to its goals.”

This DQ-Tip is from the new book Patterns of Information Management by Mandy Chessell and Harald Smith.

“An organization exists for a purpose,” Chessell and Smith explained.  “It has targets to achieve and long-term aspirations.  An organization needs to make good use of its information to achieve its goals.”  In order to do this, they recommend that you define an information strategy that lays out why, what, and how your organization will manage its information:

  • Why — The business imperatives that drive the need to be information centric, which helps focus information management efforts on the activities that deliver value to the organization.
  • What — The type of information that you must manage to deliver on those business imperatives, which includes the subject areas to cover, which attributes within each subject area that need to be managed, the valid values for those attributes, and the information management policies (such as retention and protection) that the organization wants to implement.
  • How — The information management principles that provide the general rules for how information is to be managed by the information systems and the people using them along with how information flows between them.

Developing an information strategy, according to Chessell and Smith, “creates a set of objectives for the organization, which guides the investment in information management technology and related solutions that support the business.  Starting with the business imperatives ensures the information management strategy is aligned with the needs of the organization, making it easier to demonstrate its relevance and value.”

Chessell and Smith also noted that “technology alone is not sufficient to ensure the quality, consistency, and flexibility of an organization’s information.  Classify the people connected to the organization according to their information needs and skills, provide common channels of communication and knowledge sharing about information, and user interfaces and reports through which they can access the information as appropriate.”

Chessell and Smith explained that the attitudes and skills of the organization’s people will be what enables the right behaviors in everyday operations, which is a major determination of the success of an information management program.

 

Related Posts

DQ-Tip: “The quality of information is directly related to...”

DQ-Tip: “Undisputable fact about the value and use of data...”

DQ-Tip: “Data quality tools do not solve data quality problems...”

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

DQ-Tip: “There is no point in monitoring data quality...”

DQ-Tip: “Don't pass bad data on to the next person...”

DQ-Tip: “...Go talk with the people using the data”

DQ-Tip: “Data quality is about more than just improving your data...”

DQ-Tip: “Start where you are...”

Sometimes Worse Data Quality is Better

Continuing a theme from three previous posts, which discussed when it’s okay to call data quality as good as it needs to get, the occasional times when perfect data quality is necessary, and the costs and profits of poor data quality, in this blog post I want to provide three examples of when the world of consumer electronics proved that sometimes worse data quality is better.

 

When the Betamax Bet on Video Busted

While it seems like a long time ago in a galaxy far, far away, during the 1970s and 1980s a videotape format war waged between Betamax and VHS.  Betamax was widely believed to provide superior video data quality.

But a blank Betamax tape allowed users to record up to two hours of high-quality video, whereas a VHS tape allowed users to record up to four hours of slightly lower quality video.  Consumers consistently chose quantity over quality — and especially since lower quality also meant a lower price.  Betamax tapes and machines remained more expensive based on betting that consumers would be willing to pay a premium for higher-quality video.

The VHS victory demonstrated how people often choose quantity over quality, so it doesn’t always pay to have better data quality.

 

When Lossless Lost to Lossy Audio

Much to the dismay of those working in the data quality profession, most people do not care about the quality of their data unless it becomes bad enough for them to pay attention to — and complain about.

An excellent example is bitrate, which refers to the number of bits — or the amount of data — that are processed over a certain amount of time.  In his article Does Bitrate Really Make a Difference In My Music?, Whitson Gordon examined the common debate about lossless versus lossy audio formats.

Using the example of ripping a track from a CD to a hard drive, a lossless format means the track is not compressed to the point where any of its data is lost, retaining, for all intents and purposes, the same audio data quality as the original CD track.

By contrast, a lossy format compresses the track so that it takes up less space by intentionally deleting some of its data, reducing audio data quality.  Audiophiles often claim anything other than vinyl records sound lousy because they are so lossy.

However, like truth, beauty, and art, data quality can be said to be in the eyes — or the ears — of the beholder.  So, if your favorite music sounds fine to you in MP3 file format, then not only do you not need vinyl records, audio tapes, and CDs anymore, but if you consider MP3 files good enough, then you will not pay more attention to (or pay more money for) audio data quality.

 

When Digital Killed the Photograph Star

The Eastman Kodak Company, commonly known as Kodak, which was founded by George Eastman in 1888 and dominated the photograph industry for most of the 20th century, filed for bankruptcy in January 2012.  The primary reason was that Kodak, which had previously pioneered innovations like celluloid film and color photography, failed to embrace the industry’s transition to digital photography, despite the fact that Kodak invented some of the core technology used in current digital cameras.

Why?  Because Kodak believed that the data quality of digital photographs would be generally unacceptable to consumers as a replacement for film photographs.  In much the same way that Betamax assumed consumers wanted higher-quality video, Kodak assumed consumers would always want to use higher-quality photographs to capture their “Kodak moments.”

In fairness to Kodak, mobile devices are causing a massive — and rapid — disruption to many well-established business models, creating a brave new digital world, and obviously not just for photography.  However, when digital killed the photograph star, it proved, once again, that sometimes worse data quality is better.

  

Related Posts

Data Quality and the OK Plateau

When Poor Data Quality Kills

The Costs and Profits of Poor Data Quality

Promoting Poor Data Quality

Data Quality and the Cupertino Effect

The Data Quality Wager

How Data Cleansing Saves Lives

The Dichotomy Paradox, Data Quality and Zero Defects

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

The Seventh Law of Data Quality

A Tale of Two Q’s

Paleolithic Rhythm and Data Quality

Groundhog Data Quality Day

Data Quality and The Middle Way

Stop Poor Data Quality STOP

When Poor Data Quality Calls

Freudian Data Quality

Predictably Poor Data Quality

Satisficing Data Quality

i blog of Data glad and big

I recently blogged about the need to balance the hype of big data with some anti-hype.  My hope was, like a collision of matter and anti-matter, the hype and anti-hype would cancel each other out, transitioning our energy into a more productive discussion about big data.  But, of course, few things in human discourse ever reach such an equilibrium, or can maintain it for very long.

For example, Quentin Hardy recently blogged about six big data myths based on a conference presentation by Kate Crawford, who herself also recently blogged about the hidden biases in big data.  “I call B.S. on all of it,” Derrick Harris blogged in his response to the backlash against big data.  “It might be provocative to call into question one of the hottest tech movements in generations, but it’s not really fair.  That’s because how companies and people benefit from big data, data science or whatever else they choose to call the movement toward a data-centric world is directly related to what they expect going in.  Arguing that big data isn’t all it’s cracked up to be is a strawman, pure and simple — because no one should think it’s magic to begin with.”

In their new book Big Data: A Revolution That Will Transform How We Live, Work, and Think, Viktor Mayer-Schonberger and Kenneth Cukier explained that “like so many new technologies, big data will surely become a victim of Silicon Valley’s notorious hype cycle: after being feted on the cover of magazines and at industry conferences, the trend will be dismissed and many of the data-smitten startups will flounder.  But both the infatuation and the damnation profoundly misunderstand the importance of what is taking place.  Just as the telescope enabled us to comprehend the universe and the microscope allowed us to understand germs, the new techniques for collecting and analyzing huge bodies of data will help us make sense of our world in ways we are just starting to appreciate.  The real revolution is not in the machines that calculate data, but in data itself and how we use it.”

Although there have been numerous critical technology factors making the era of big data possible, such as increases in the amount of computing power, decreases in the cost of data storage, increased network bandwidth, parallel processing frameworks (e.g., Hadoop), scalable and distributed models (e.g., cloud computing), and other techniques (e.g., in-memory computing), Mayer-Schonberger and Cukier argued that “something more important changed too, something subtle.  There was a shift in mindset about how data could be used.  Data was no longer regarded as static and stale, whose usefulness was finished once the purpose for which it was collected was achieved.  Rather, data became a raw material of business, a vital economic input, used to create a new form of economic value.”

“In fact, with the right mindset, data can be cleverly used to become a fountain of innovation and new services.  The data can reveal secrets to those with the humility, the willingness, and the tools to listen.”

Pondering this big data war of words reminded me of the E. E. Cummings poem i sing of Olaf glad and big, which sings of Olaf, a conscientious objector forced into military service, who passively endures brutal torture inflicted upon him by training officers, while calmly responding (pardon the profanity): “I will not kiss your fucking flag” and “there is some shit I will not eat.”

Without question, big data has both positive and negative aspects, but the seeming unwillingness of either side in the big data war of words to “kiss each other’s flag,” so to speak, is not as concerning to me as is the conscientious objection to big data and data science expanding into realms where people and businesses were not used to enduring its influence.  For example, some will feel that data-driven audits of their decision-making is like brutal torture inflicted upon their less-than data-driven intuition.

E.E. Cummings sang the praises of Olaf “because unless statistics lie, he was more brave than me.”  i blog of Data glad and big, but I fear that, regardless of how big it is, “there is some data I will not believe” will be a common refrain by people who will lack the humility and willingness to listen to data, and who will not be brave enough to admit that statistics don’t always lie.

 

Related Posts

The Need for Data Philosophers

On Philosophy, Science, and Data

OCDQ Radio - Demystifying Data Science

OCDQ Radio - Data Quality and Big Data

Big Data and the Infinite Inbox

The Laugh-In Effect of Big Data

HoardaBytes and the Big Data Lebowski

Magic Elephants, Data Psychics, and Invisible Gorillas

Will Big Data be Blinded by Data Science?

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

Our Increasingly Data-Constructed World

The Wisdom of Crowds, Friends, and Experts

Data Separates Science from Superstition

Headaches, Data Analysis, and Negativity Bias

Why Data Science Storytelling Needs a Good Editor

Predictive Analytics, the Data Effect, and Jed Clampett

Rage against the Machines Learning

The Flying Monkeys of Big Data

Cargo Cult Data Science

Speed Up Your Data to Slow Down Your Decisions

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Business Analytics for Midsize Businesses

As this growing list of definitions for big data attests, big data evangelist and IBM thought leader James Kobielus rightfully warns that big data is in danger of definitional overkill.  But most midsize business owners are less concerned about defining big data as they are about, as Laurie McCabe recently blogged, determining whether big data is relevant for their business.

“The fact of the matter is, big is a relative term,” McCabe explained, “relative to the amount of information that your organization needs to sift through to find the insights you need to operate the business more proactively and profitably.”

McCabe also noted that this is not just a problem for big businesses, since getting better insights from the data you already have is a challenge for businesses of all sizes.  Midsize businesses “may not be dealing with terabytes of data,” McCabe explained, “but many are finding that tools that used to suffice—such as Excel spreadsheets—fall short even when it comes to analyzing internal transactional databases.”  McCabe also provided recommendations for how midsize businesses can put big data to work.

The recent IBM study The Case for Business Analytics in Midsize Firms lists big data as one of the trends making a compelling case for the growing importance of business analytics for midsize businesses.  The study also noted that important functional data continues to live in departmental spreadsheets, and state-of-the-art business analytics solutions are needed to make it easy to pull all that data, along with data from other sources, together in a meaningful way.  Despite the common misconception that such solutions are too expensive for midsize businesses, solutions are now available that can deliver analytics capabilities to help overcome big data challenges without requiring a big upfront investment in hardware or software.

Phil Simon, author of Too Big to Ignore: The Business Case for Big Data, recently blogged about reporting versus analytics, explaining the essence of analytics is it goes beyond the what and where provided by reporting, and tries to explain the why.

Big data isn’t the only reason why analytics is becoming more of a necessity.  But with the barriers to what it costs and where it can be deployed becoming easier to overcome, business analytics is becoming more commonplace in midsize businesses.

IBM Logo.jpg