Data Quality and Chicken Little Syndrome

“The sky is falling!” exclaimed Chicken Little after an acorn fell on his head, causing him to undertake a journey to tell the King that the world is coming to an end.  So says the folk tale that became an allegory for people accused of being unreasonably afraid, or people trying to incite an unreasonable fear in those around them, sometimes referred to as Chicken Little Syndrome.

The sales pitches for data quality solutions often suffer from Chicken Little Syndrome, when vendors and consultants, instead of trying to sell the business benefits of data quality, focus too much on the negative aspects of not investing in data quality, and try scaring people into prioritizing data quality initiatives by exclaiming “your company is failing because your data quality is bad!”

The Chicken Littles of Data Quality use sound bites like “data quality problems cost businesses more than $600 billion a year!” or “poor data quality costs organizations 35% of their revenue!”  However, the most common characteristic of these fear mongering estimates about the costs of poor data quality is that, upon closer examination, most of them either rely on anecdotal evidence, or hide behind the curtain of an allegedly proprietary case study, the details of which conveniently can’t be publicly disclosed.

Lacking a tangible estimate for the cost of poor data quality often complicates building the business case for data quality.  Even though a data quality initiative has the long-term potential of reducing the costs, and mitigating the risks, associated with poor data quality, its initial costs are very tangible.  For example, the short-term increased costs of a data quality initiative can include the purchase of data quality software, and the professional services needed for training and consulting to support installation, configuration, application development, testing, and production implementation.  When considering these short-term costs, and especially when lacking a tangible estimate for the cost of poor data quality, many organizations understandably conclude that it’s less risky to gamble on not investing in a data quality initiative and hope things are just not as bad as Chicken Little claims.

“The sky isn’t falling on us.”

Furthermore, the reason that citing specific examples of poor data quality (e.g., IQTrainwrecks.com) also doesn’t work very well is not just because of the lack of a verifiable estimate for the associated business costs.  Another significant contributing factor is that people naturally dismiss the possibility that something bad that happened to someone else could also happen to them.

So, when Chicken Little undertakes a journey to tell the CEO that the organization is coming to an end due to poor data quality, exclaiming that “the sky is falling!” while citing one of those data quality disaster stories that befell another organization, should we really be surprised when the CEO looks up, scratches their head, and declares that “the sky isn’t falling on us.”

Sometimes, denying the existence of data quality issues is a natural self-defense mechanism for the people responsible for the business processes and technology surrounding data since nobody wants to be blamed for causing, or failing to fix, data quality issues.  Other times, people suffer from the illusion-of-quality effect caused by the dark side of data cleansing.  In other words, they don’t believe that data quality issues occur very often because the data made available to end users in dashboards and reports often passes through many processes that cleanse or otherwise sanitize the data before it reaches them.

Can we stop Playing Chicken with Data Quality?

Most of the time, advocating for data quality feels like we are playing chicken with executive sponsors and business stakeholders, as if we were driving toward them at full speed on a collision course, armed with fear mongering and disaster stories, hoping that they swerve in the direction of approving a data quality initiative.  But there has to be a better way to advocate for data quality other than constantly exclaiming that “the sky is falling!”  (Don’t cry fowl — I realize that I just mixed my chicken metaphors.)

The Stone Wars of Root Cause Analysis

“As a single stone causes concentric ripples in a pond,” Martin Doyle commented on my blog post There is No Such Thing as a Root Cause, “there will always be one root cause event creating the data quality wave.  There may be interference after the root cause event which may look like a root cause, creating eddies of side effects and confusion, but I believe there will always be one root cause.  Work backwards from the data quality side effects to the root cause and the data quality ripples will be eliminated.”

Martin Doyle and I continued our congenial blog comment banter on my podcast episode The Johari Window of Data Quality, but in this blog post I wanted to focus on the stone-throwing metaphor for root cause analysis.

Let’s begin with the concept of a single stone causing the concentric ripples in a pond.  Is the stone really the root cause?  Who threw the stone?  Why did that particular person choose to throw that specific stone?  How did the stone come to be alongside the pond?  Which path did the stone-thrower take to get to the pond?  What happened to the stone-thrower earlier in the day that made them want to go to the pond, and once there, pick up a stone and throw it in the pond?

My point is that while root cause analysis is important to data quality improvement, too often we can get carried away riding the ripples of what we believe to be the root cause of poor data quality.  Adding to the complexity is the fact there’s hardly ever just one stone.  Many stones get thrown into our data ponds, and trying to un-ripple their poor quality effects can lead us to false conclusions because causation is non-linear in nature.  Causation is a complex network of many interrelated causes and effects, so some of what appear to be the effects of the root cause you have isolated may, in fact, be the effects of other causes.

As Laura Sebastian-Coleman explains, data quality assessments are often “a quest to find a single criminal—The Root Cause—rather than to understand the process that creates the data and the factors that contribute to data issues and discrepancies.”  Those approaching data quality this way, “start hunting for the one thing that will explain all the problems.  Their goal is to slay the root cause and live happily ever after.  Their intentions are good.  And slaying root causes—such as poor process design—can bring about improvement.  But many data problems are symptoms of a lack of knowledge about the data and the processes that create it.  You cannot slay a lack of knowledge.  The only way to solve a knowledge problem is to build knowledge of the data.”

Believing that you have found and eliminated the root cause of all your data quality problems is like believing that after you have removed the stones from your pond (i.e., data cleansing), you can stop the stone-throwers by building a high stone-deflecting wall around your pond (i.e., defect prevention).  However, there will always be stones (i.e., data quality issues) and there will always be stone-throwers (i.e., people and processes) that will find a way to throw a stone in your pond.

In our recent podcast Measuring Data Quality for Ongoing Improvement, Laura Sebastian-Coleman and I discussed although root cause is used as a singular noun, just as data is used as a singular noun, we should talk about root causes since, just as data analysis is not analysis of a single datum, root cause analysis should not be viewed as analysis of a single root cause.

The bottom line, or, if you prefer, the ripple at the bottom of the pond, is the Stone Wars of Root Cause Analysis will never end because data quality is a journey, not a destination.  After all, that’s why it’s called ongoing data quality improvement.

 

Related Posts

There is No Such Thing as a Root Cause

DQ-View: Occam’s Razor Burn

Data Quality: Quo Vadimus?

Finding Data Quality

Why isn’t our data quality worse?

Data and Process Transparency

Days Without A Data Quality Issue

The Dichotomy Paradox, Data Quality and Zero Defects

Data and its Relationships with Quality

The Role of Data Quality Monitoring in Data Governance

Data Quality and Miracle Exceptions

Data Myopia and Business Relativity

Expectation and Data Quality

Is your data accurate, but useless to your business?

DQ-Tip: “Data quality is primarily about context...”

DQ-Tip: “There is no such thing as data accuracy...”

Data Quality and the Cupertino Effect

You Need to Know Your Data Quality Reference Points

Why You Need Data Quality Standards

Adventures in Data Profiling

Sometimes Worse Data Quality is Better

Continuing a theme from three previous posts, which discussed when it’s okay to call data quality as good as it needs to get, the occasional times when perfect data quality is necessary, and the costs and profits of poor data quality, in this blog post I want to provide three examples of when the world of consumer electronics proved that sometimes worse data quality is better.

 

When the Betamax Bet on Video Busted

While it seems like a long time ago in a galaxy far, far away, during the 1970s and 1980s a videotape format war waged between Betamax and VHS.  Betamax was widely believed to provide superior video data quality.

But a blank Betamax tape allowed users to record up to two hours of high-quality video, whereas a VHS tape allowed users to record up to four hours of slightly lower quality video.  Consumers consistently chose quantity over quality — and especially since lower quality also meant a lower price.  Betamax tapes and machines remained more expensive based on betting that consumers would be willing to pay a premium for higher-quality video.

The VHS victory demonstrated how people often choose quantity over quality, so it doesn’t always pay to have better data quality.

 

When Lossless Lost to Lossy Audio

Much to the dismay of those working in the data quality profession, most people do not care about the quality of their data unless it becomes bad enough for them to pay attention to — and complain about.

An excellent example is bitrate, which refers to the number of bits — or the amount of data — that are processed over a certain amount of time.  In his article Does Bitrate Really Make a Difference In My Music?, Whitson Gordon examined the common debate about lossless versus lossy audio formats.

Using the example of ripping a track from a CD to a hard drive, a lossless format means the track is not compressed to the point where any of its data is lost, retaining, for all intents and purposes, the same audio data quality as the original CD track.

By contrast, a lossy format compresses the track so that it takes up less space by intentionally deleting some of its data, reducing audio data quality.  Audiophiles often claim anything other than vinyl records sound lousy because they are so lossy.

However, like truth, beauty, and art, data quality can be said to be in the eyes — or the ears — of the beholder.  So, if your favorite music sounds fine to you in MP3 file format, then not only do you not need vinyl records, audio tapes, and CDs anymore, but if you consider MP3 files good enough, then you will not pay more attention to (or pay more money for) audio data quality.

 

When Digital Killed the Photograph Star

The Eastman Kodak Company, commonly known as Kodak, which was founded by George Eastman in 1888 and dominated the photograph industry for most of the 20th century, filed for bankruptcy in January 2012.  The primary reason was that Kodak, which had previously pioneered innovations like celluloid film and color photography, failed to embrace the industry’s transition to digital photography, despite the fact that Kodak invented some of the core technology used in current digital cameras.

Why?  Because Kodak believed that the data quality of digital photographs would be generally unacceptable to consumers as a replacement for film photographs.  In much the same way that Betamax assumed consumers wanted higher-quality video, Kodak assumed consumers would always want to use higher-quality photographs to capture their “Kodak moments.”

In fairness to Kodak, mobile devices are causing a massive — and rapid — disruption to many well-established business models, creating a brave new digital world, and obviously not just for photography.  However, when digital killed the photograph star, it proved, once again, that sometimes worse data quality is better.

  

Related Posts

Data Quality and the OK Plateau

When Poor Data Quality Kills

The Costs and Profits of Poor Data Quality

Promoting Poor Data Quality

Data Quality and the Cupertino Effect

The Data Quality Wager

How Data Cleansing Saves Lives

The Dichotomy Paradox, Data Quality and Zero Defects

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

The Seventh Law of Data Quality

A Tale of Two Q’s

Paleolithic Rhythm and Data Quality

Groundhog Data Quality Day

Data Quality and The Middle Way

Stop Poor Data Quality STOP

When Poor Data Quality Calls

Freudian Data Quality

Predictably Poor Data Quality

Satisficing Data Quality

i blog of Data glad and big

I recently blogged about the need to balance the hype of big data with some anti-hype.  My hope was, like a collision of matter and anti-matter, the hype and anti-hype would cancel each other out, transitioning our energy into a more productive discussion about big data.  But, of course, few things in human discourse ever reach such an equilibrium, or can maintain it for very long.

For example, Quentin Hardy recently blogged about six big data myths based on a conference presentation by Kate Crawford, who herself also recently blogged about the hidden biases in big data.  “I call B.S. on all of it,” Derrick Harris blogged in his response to the backlash against big data.  “It might be provocative to call into question one of the hottest tech movements in generations, but it’s not really fair.  That’s because how companies and people benefit from big data, data science or whatever else they choose to call the movement toward a data-centric world is directly related to what they expect going in.  Arguing that big data isn’t all it’s cracked up to be is a strawman, pure and simple — because no one should think it’s magic to begin with.”

In their new book Big Data: A Revolution That Will Transform How We Live, Work, and Think, Viktor Mayer-Schonberger and Kenneth Cukier explained that “like so many new technologies, big data will surely become a victim of Silicon Valley’s notorious hype cycle: after being feted on the cover of magazines and at industry conferences, the trend will be dismissed and many of the data-smitten startups will flounder.  But both the infatuation and the damnation profoundly misunderstand the importance of what is taking place.  Just as the telescope enabled us to comprehend the universe and the microscope allowed us to understand germs, the new techniques for collecting and analyzing huge bodies of data will help us make sense of our world in ways we are just starting to appreciate.  The real revolution is not in the machines that calculate data, but in data itself and how we use it.”

Although there have been numerous critical technology factors making the era of big data possible, such as increases in the amount of computing power, decreases in the cost of data storage, increased network bandwidth, parallel processing frameworks (e.g., Hadoop), scalable and distributed models (e.g., cloud computing), and other techniques (e.g., in-memory computing), Mayer-Schonberger and Cukier argued that “something more important changed too, something subtle.  There was a shift in mindset about how data could be used.  Data was no longer regarded as static and stale, whose usefulness was finished once the purpose for which it was collected was achieved.  Rather, data became a raw material of business, a vital economic input, used to create a new form of economic value.”

“In fact, with the right mindset, data can be cleverly used to become a fountain of innovation and new services.  The data can reveal secrets to those with the humility, the willingness, and the tools to listen.”

Pondering this big data war of words reminded me of the E. E. Cummings poem i sing of Olaf glad and big, which sings of Olaf, a conscientious objector forced into military service, who passively endures brutal torture inflicted upon him by training officers, while calmly responding (pardon the profanity): “I will not kiss your fucking flag” and “there is some shit I will not eat.”

Without question, big data has both positive and negative aspects, but the seeming unwillingness of either side in the big data war of words to “kiss each other’s flag,” so to speak, is not as concerning to me as is the conscientious objection to big data and data science expanding into realms where people and businesses were not used to enduring its influence.  For example, some will feel that data-driven audits of their decision-making is like brutal torture inflicted upon their less-than data-driven intuition.

E.E. Cummings sang the praises of Olaf “because unless statistics lie, he was more brave than me.”  i blog of Data glad and big, but I fear that, regardless of how big it is, “there is some data I will not believe” will be a common refrain by people who will lack the humility and willingness to listen to data, and who will not be brave enough to admit that statistics don’t always lie.

 

Related Posts

The Need for Data Philosophers

On Philosophy, Science, and Data

OCDQ Radio - Demystifying Data Science

OCDQ Radio - Data Quality and Big Data

Big Data and the Infinite Inbox

The Laugh-In Effect of Big Data

HoardaBytes and the Big Data Lebowski

Magic Elephants, Data Psychics, and Invisible Gorillas

Will Big Data be Blinded by Data Science?

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

Our Increasingly Data-Constructed World

The Wisdom of Crowds, Friends, and Experts

Data Separates Science from Superstition

Headaches, Data Analysis, and Negativity Bias

Why Data Science Storytelling Needs a Good Editor

Predictive Analytics, the Data Effect, and Jed Clampett

Rage against the Machines Learning

The Flying Monkeys of Big Data

Cargo Cult Data Science

Speed Up Your Data to Slow Down Your Decisions

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

The Need for Data Philosophers

In my post On Philosophy, Science, and Data, I explained that although some argue philosophy only reigns in the absence of data while science reigns in the analysis of data, a conceptual bridge still remains between analysis and insight, the crossing of which is itself a philosophical exercise.  Therefore, I argued that an endless oscillation persists between science and philosophy, which is why, despite the fact that all we hear about is the need for data scientists, there’s also a need for data philosophers.

Of course, the debate between science and philosophy is a very old one, as is the argument we need both.  In my previous post, I slightly paraphrased Immanuel Kant (“perception without conception is blind and conception without perception is empty”) by saying that science without philosophy is blind and philosophy without science is empty.

In his book Cosmic Apprentice: Dispatches from the Edges of Science, Dorion Sagan explained that science and philosophy hang “in a kind of odd balance, watching each other, holding hands.  Science’s eye for detail, buttressed by philosophy’s broad view, makes for a kind of alembic, an antidote to both.  Although philosophy isn’t fiction, it can be more personal, creative and open, a kind of counterbalance for science even as it argues that science, with its emphasis on a kind of impersonal materialism, provides a crucial reality check for philosophy and a tendency to over-theorize that’s inimical to the scientific spirit.  Ideally, in the search for truth, science and philosophy, the impersonal and autobiographical, can keep each other honest in a kind of open circuit.”

“Science’s spirit is philosophical,” Sagan concluded.  “It is the spirit of questioning, of curiosity, of critical inquiry combined with fact-checking.  It is the spirit of being able to admit you’re wrong, of appealing to data, not authority.”

“Science,” as his father Carl Sagan said, “is a way of thinking much more than it is a body of knowledge.”  By extension, we could say that data science is about a way of thinking much more than it is about big data or about being data-driven.

I have previously blogged that science has always been about bigger questions, not bigger data.  As Claude Lévi-Strauss said, “the scientist is not a person who gives the right answers, but one who asks the right questions.”  As far as data science goes, what are the right questions?  Data scientist Melinda Thielbar proposes three key questions (Actionable? Verifiable? Repeatable?).

Here again we see the interdependence of science and philosophy.  “Philosophy,” Marilyn McCord Adams said, “is thinking really hard about the most important questions and trying to bring analytic clarity both to the questions and the answers.”

“Philosophy is critical thinking,” Don Cupitt said. “Trying to become aware of how one’s own thinking works, of all the things one takes for granted, of the way in which one’s own thinking shapes the things one’s thinking about.”  Yes, even a data scientist’s own thinking could shape the things they are thinking scientifically about.  Big data evangelist James Kobielus recently blogged about five biases that may crop up in a data scientist’s work (Cognitive, Selection, Sampling, Modeling, Funding).

“Data science has a bright future ahead,” explained Hilary Mason in a recent interview.  “There will only be more data, and more of a need for people who can find meaning and value in that data.  We’re also starting to see a greater need for data engineers, people to build infrastructure around data and algorithms, and data artists, people who can visualize the data.”

I agree with Mason, and I would add that we are also starting to see a greater need for data philosophers, people who can, borrowing the words that Anthony Kenny used to define philosophy, “think as clearly as possible about the most fundamental concepts that reach through all the disciplines.”

 

Related Posts

On Philosophy, Science, and Data

OCDQ Radio - Demystifying Data Science

OCDQ Radio - Data Quality and Big Data

Big Data and the Infinite Inbox

The Laugh-In Effect of Big Data

HoardaBytes and the Big Data Lebowski

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

The Wisdom of Crowds, Friends, and Experts

Why Data Science Storytelling Needs a Good Editor

Predictive Analytics, the Data Effect, and Jed Clampett

Bigger Questions, not Bigger Data

The Flying Monkeys of Big Data

Cargo Cult Data Science

Speed Up Your Data to Slow Down Your Decisions

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Keep Looking Up Insights in Data

In a previous post, I used the history of the Hubble Space Telescope to explain how data cleansing saves lives, based on a true story I read in the book Space Chronicles: Facing the Ultimate Frontier by Neil deGrasse Tyson.  In this post, Hubble and Tyson once again provide the inspiration for an insightful metaphor about data quality.

Hubble is one of dozens of space telescopes of assorted sizes and shapes orbiting the Earth.  “Each one,” Tyson explained, “provides a view of the cosmos that is unobstructed, unblemished, and undiminished by Earth’s turbulent and murky atmosphere.  They are designed to detect bands of light invisible to the human eye, some of which never penetrate Earth’s atmosphere.  Hubble is the first and only space telescope to observe the universe using primarily visible light.  Its stunningly crisp, colorful, and detailed images of the cosmos make Hubble a kind of supreme version of the human eye in space.”

This is how we’d like the quality of data to be when we’re looking for business insights.  High-quality data provides stunningly crisp, colorful, and detailed images of the business cosmos, acting as a kind of supreme version of the human eye in data.

However, despite their less-than-perfect vision, the limitations of Earth-based telescopes still facilitated significant scientific breakthroughs long before Hubble became the first space telescope in 1990.

In 1609, when the Italian physicist and astronomer Galileo Galilei turned a telescope of his own design to the sky, as Tyson explained, he “heralded a new era of technology-aided discovery, whereby the capacities of the human senses could be extended, revealing the natural world in unprecedented, even heretical ways.  The fact that Galileo revealed the Sun to have spots, the planet Jupiter to have satellites [its four moons: Callisto, Ganymede, Europa, Io], and Earth not to be the center of all celestial motion was enough to unsettle centuries of Aristotelian teachings by the Catholic Church and to put Galileo under house arrest.”

And in 1964, another Earth-based telescope, this one operated by the American astronomers Arno Penzias and Robert Wilson at AT&T Bell Labs, was responsible for what is widely considered the most important single discovery in astrophysics, what’s now known as cosmic microwave background radiation, and for which Penzias and Wilson won the 1978 Nobel Prize in Physics.

Recently, I’ve blogged about how there are times when perfect data quality is necessary, when we need the equivalent of a space telescope, and times when okay data quality is good enough, when the equivalent of an Earth-based telescope will do.

What I would like you to take away from this post is that perfect data quality is not a prerequisite for the discovery of new business insights.  Even when data doesn’t provide a perfect view of the business cosmos, even when it’s partially obstructed, blemished, or diminished by the turbulent and murky atmosphere of poor quality, data can still provide business insights.

This doesn’t mean that you should settle for poor data quality, just that you shouldn’t demand perfection before using data.

Tyson ends each episode of his StarTalk Radio program by saying “keep looking up,” so I will end this blog post by saying, even when its quality isn’t perfect, keep looking up insights in data.

 

Related Posts

Data Quality and the OK Plateau

When Poor Data Quality Kills

How Data Cleansing Saves Lives

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

The Seventh Law of Data Quality

A Tale of Two Q’s

Data Quality and The Middle Way

Stop Poor Data Quality STOP

Freudian Data Quality

Predictably Poor Data Quality

This isn’t Jeopardy

Satisficing Data Quality

The Costs and Profits of Poor Data Quality

Continuing the theme of my two previous posts, which discussed when it’s okay to call data quality as good as it needs to get and when perfect data quality is necessary, in this post I want to briefly discuss the costs — and profits — of poor data quality.

Loraine Lawson interviewed Ted Friedman of Gartner Research about How to Measure the Cost of Data Quality Problems, such as the costs associated with reduced productivity, redundancies, business processes breaking down because of data quality issues, regulatory compliance risks, and lost business opportunities.  David Loshin blogged about the challenge of estimating the cost of poor data quality, noting that many estimates, upon close examination, seem to rely exclusively on anecdotal evidence.

A recent Mental Floss article recounted 10 Very Costly Typos, including the 1962 $80 million dollar missing hyphen in the programming code that led to the destruction of the Mariner 1 spacecraft, the 2007 Roswell, New Mexico car dealership promotion where instead of 1 out of 50,000 scratch lottery tickets revealing a $1,000 cash grand prize, all of the tickets were printed as grand-prize winners, which would have been a $50 million payout, but $250,000 in Walmart gift certificates were given out instead, and, more recently, the March 2013 typographical error in the price of pay-per-ride cards on 160,000 maps and posters that cost New York City’s Transportation Authority approximately $500,000.

Although we often only think about the costs of poor data quality, the article also shared some 2010 research performed by Harvard University claiming that Google profits an estimated $497 million dollars a year from people mistyping the names of popular websites and landing on typosquatter sites, which just happen to be conveniently littered with Google ads.

Poor data quality has also long played an important role in improving Google Search, where misspellings of search terms entered by users (and not just a spellchecker program) is leveraged by the algorithm providing the Did you mean, Including results for, and Search instead for help text displayed at the top of the first page of Google Search results.

What examples (or calculation methods) can you provide about either the costs or profits associated with poor data quality?

 

Related Posts

Promoting Poor Data Quality

Data Quality and the Cupertino Effect

The Data Quality Wager

Data Quality and the OK Plateau

When Poor Data Quality Kills

How Data Cleansing Saves Lives

The Dichotomy Paradox, Data Quality and Zero Defects

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

Data and its Relationships with Quality

The Seventh Law of Data Quality

A Tale of Two Q’s

Paleolithic Rhythm and Data Quality

Groundhog Data Quality Day

Data Quality and The Middle Way

Stop Poor Data Quality STOP

When Poor Data Quality Calls

Freudian Data Quality

Predictably Poor Data Quality

Satisficing Data Quality

When Poor Data Quality Kills

In my previous post, I made the argument that many times it’s okay to call data quality as good as it needs to get, as opposed to demanding data perfection.  However, a balanced perspective demands acknowledging there are times when nothing less than perfect data quality is necessary.  In fact, there are times when poor data quality can have deadly consequences.

In his book The Information: A History, a Theory, a Flood, James Gleick explained “pharmaceutical names are a special case: a subindustry has emerged to coin them, research them, and vet them.  In the United States, the Food and Drug Administration reviews proposed drug names for possible collisions, and this process is complex and uncertain.  Mistakes cause death.”

“Methadone, for opiate dependence, has been administrated in place of Metadate, for attention-deficit disorder, and Taxcol, a cancer drug, for Taxotere, a different cancer drug, with fatal results.  Doctors fear both look-alike errors and sound-alike errors: Zantac/Xanax; Verelan/Virilon.  Linguists devise scientific measures of the distance between names.  But Lamictal and Lamisil and Ludiomil and Lomotil are all approved drug names.”

All data matching techniques, such as edit distance functions, phonetic comparisons, and more complex algorithms, provide a way to represent (e.g., numeric probabilities, weighted percentages, odds ratios, etc.) the likelihood that two non-exact matching data items are the same.  No matter what data quality software vendors tell you, all data matching techniques are susceptible to false negatives (data that did not match, but should have) and false positives (data that matched, but should not have).

This pharmaceutical example is one case where a false positive could be deadly, a time when poor data quality kills.  Admittedly, this is an extreme example.  What other examples can you offer where perfect data quality is actually a necessity?

 

Related Posts

Data Quality and the OK Plateau

How Data Cleansing Saves Lives

What going to the dentist taught me about data quality

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

Data and its Relationships with Quality

DQ-Tip: “There is no such thing as data accuracy...”

The Seventh Law of Data Quality

A Tale of Two Q’s

Paleolithic Rhythm and Data Quality

Groundhog Data Quality Day

Data Quality and The Middle Way

Stop Poor Data Quality STOP

When Poor Data Quality Calls

Freudian Data Quality

Predictably Poor Data Quality

Satisficing Data Quality