Data Quality and Chicken Little Syndrome

“The sky is falling!” exclaimed Chicken Little after an acorn fell on his head, causing him to undertake a journey to tell the King that the world is coming to an end.  So says the folk tale that became an allegory for people accused of being unreasonably afraid, or people trying to incite an unreasonable fear in those around them, sometimes referred to as Chicken Little Syndrome.

The sales pitches for data quality solutions often suffer from Chicken Little Syndrome, when vendors and consultants, instead of trying to sell the business benefits of data quality, focus too much on the negative aspects of not investing in data quality, and try scaring people into prioritizing data quality initiatives by exclaiming “your company is failing because your data quality is bad!”

The Chicken Littles of Data Quality use sound bites like “data quality problems cost businesses more than $600 billion a year!” or “poor data quality costs organizations 35% of their revenue!”  However, the most common characteristic of these fear mongering estimates about the costs of poor data quality is that, upon closer examination, most of them either rely on anecdotal evidence, or hide behind the curtain of an allegedly proprietary case study, the details of which conveniently can’t be publicly disclosed.

Lacking a tangible estimate for the cost of poor data quality often complicates building the business case for data quality.  Even though a data quality initiative has the long-term potential of reducing the costs, and mitigating the risks, associated with poor data quality, its initial costs are very tangible.  For example, the short-term increased costs of a data quality initiative can include the purchase of data quality software, and the professional services needed for training and consulting to support installation, configuration, application development, testing, and production implementation.  When considering these short-term costs, and especially when lacking a tangible estimate for the cost of poor data quality, many organizations understandably conclude that it’s less risky to gamble on not investing in a data quality initiative and hope things are just not as bad as Chicken Little claims.

“The sky isn’t falling on us.”

Furthermore, the reason that citing specific examples of poor data quality (e.g., IQTrainwrecks.com) also doesn’t work very well is not just because of the lack of a verifiable estimate for the associated business costs.  Another significant contributing factor is that people naturally dismiss the possibility that something bad that happened to someone else could also happen to them.

So, when Chicken Little undertakes a journey to tell the CEO that the organization is coming to an end due to poor data quality, exclaiming that “the sky is falling!” while citing one of those data quality disaster stories that befell another organization, should we really be surprised when the CEO looks up, scratches their head, and declares that “the sky isn’t falling on us.”

Sometimes, denying the existence of data quality issues is a natural self-defense mechanism for the people responsible for the business processes and technology surrounding data since nobody wants to be blamed for causing, or failing to fix, data quality issues.  Other times, people suffer from the illusion-of-quality effect caused by the dark side of data cleansing.  In other words, they don’t believe that data quality issues occur very often because the data made available to end users in dashboards and reports often passes through many processes that cleanse or otherwise sanitize the data before it reaches them.

Can we stop Playing Chicken with Data Quality?

Most of the time, advocating for data quality feels like we are playing chicken with executive sponsors and business stakeholders, as if we were driving toward them at full speed on a collision course, armed with fear mongering and disaster stories, hoping that they swerve in the direction of approving a data quality initiative.  But there has to be a better way to advocate for data quality other than constantly exclaiming that “the sky is falling!”  (Don’t cry fowl — I realize that I just mixed my chicken metaphors.)

Beware the Data Governance Ides of March

WindowsLiveWriter-TheIdesofMarchandtheTheatreofDataQuality_80BF-

Morte de Césare (Death of Caesar) by Vincenzo Camuccini, 1798

Today is the Ides of March (March 15), which back in 44 BC was definitely not a good day to be Julius Caesar, who was literally stabbed in the back by the Roman Senate during his assassination in the Theatre of Pompey (as depicted above), which was spearheaded by Brutus and Cassius in a failed attempt to restore the Roman Republic, but instead resulted in a series of civil wars that ultimately led to the establishment of the permanent Roman Empire by Caesar’s heir Octavius (aka Caesar Augustus).

“Beware the Ides of March” is the famously dramatized warning from William Shakespeare’s play Julius Caesar, which has me pondering whether a data governance program implementation has an Ides of March (albeit a less dramatic one—hopefully).

Hybrid Approach (starting Top-Down) is currently leading my unscientific poll about the best way to approach data governance, acknowledging executive sponsorship and a data governance board will be required for the top-down-driven activities of funding, policy making and enforcement, decision rights, and arbitration of conflicting business priorities as well as organizational politics.

The definition of data governance policies illustrates the intersection of business, data, and technical knowledge spread throughout the organization, revealing how interconnected and interdependent the organization is.  The policies provide a framework for the communication and collaboration of business, data, and technical stakeholders, and establish an enterprise-wide understanding of the roles and responsibilities involved, and the accountability required to support the organization’s daily business activities.

The process of defining data governance policies resembles the communication and collaboration of the Roman Republic, but the process of implementing and enforcing data governance policies resembles the command and control of the Roman Empire.

During this transition of power, from policy definition to policy implementation and enforcement, lies the greatest challenge for a data governance program.  Even though no executive sponsor is the Data Governance Emperor (not even Caesar CEO) and the data governance board is not the Data Governance Senate, a heavy-handed top-down approach to data governance can make policy compliance feel like imperial rule and policy enforcement feel like martial law.  Although a series of enterprise civil wars is unlikely to result, the data governance program is likely to fail without the support of a strong and stable bottom-up foundation.

The enforcement of data governance policies is often confused with traditional management notions of command and control, but the enduring success of data governance requires an organizational culture that embodies communication and collaboration, which is mostly facilitated by bottom-up-driven activities led by the example of data stewards and other peer-level change agents.

“Beware the Data Governance Ides of March” is my dramatized warning about relying too much on the top-down approach to implementing data governance—and especially if your organization has any data stewards named Brutus or Cassius.

Plato’s Data

Plato’s Cave is a famous allegory from philosophy that describes a fictional scenario where people mistake an illusion for reality.

The allegory describes a group of people who have lived their whole lives as prisoners chained motionless in a dark cave, forced to face a blank wall.  Behind the prisoners is a large fire.  In front of the fire are puppeteers that project shadows onto the cave wall, acting out little plays, which include mimicking voices and sound effects that echo off the cave walls.  These shadows and echoes are only projections, partial reflections of a reality created by the puppeteers.  However, this illusion represents the only reality the prisoners have ever known, and so to them the shadows are real sights and the echoes are real sounds.

When one of the prisoners is freed and permitted to turn around and see the source of the shadows and echoes, he rejects reality as an illusion.  The prisoner is then dragged out of the cave into the sunlight, out into the bright, painful light of the real world, which he also rejects as an illusion.  How could these sights and sounds be real to him when all he has ever known is the cave?

But eventually the prisoner acclimates to the real world, realizing that the real illusion was the shadows and echoes in the cave.

Unfortunately, this is when he’s returned to his imprisonment in the cave.  Can you imagine how painful the rest of his life will be, once again being forced to watch the shadows and listen to the echoes — except now he knows that they are not real.

Plato’s Cinema

A modern update on the allegory is something we could call Plato’s Cinema, where a group of people live their whole lives as prisoners chained motionless in a dark cinema, forced to face a blank screen.  Behind the audience is a large movie projector.

Please stop reading for a moment and try to imagine if everything you ever knew was based entirely on the movies you watched.

Now imagine you are one of the prisoners, and you did not get to choose the movies, but instead were forced to watch whatever the projectionist chooses to show you.  Although the fictional characters and stories of these movies are only projections, partial reflections of a reality created by the movie producers, since this illusion would represent the only reality you have ever known, to you the characters would be real people and the stories would be real events.

If you were freed from this cinema prison, permitted to turn around and see the projector, wouldn’t you reject it as an illusion?  If you were dragged out of the cinema into the sunlight, out into the bright, painful light of the real world, wouldn’t you also reject reality as an illusion?  How could these sights and sounds be real to you when all you have ever known is the cinema?

Let’s say that you eventually acclimated to the real world, realizing that the real illusion was the projections on the movie screen.

However, now let’s imagine that you are then returned to your imprisonment in the dark cinema.  Can you imagine how painful the rest of your life would be, once again being forced to watch the movies — except now you know that they are not real.

Plato’s Data

Whether it’s an abstract description of real-world entities (i.e., “master data”) or an abstract description of real-world interactions (i.e., “transaction data”) among entities, data is an abstract description of reality — let’s call this the allegory of Plato’s Data.

We often act as if we are being forced to face our computer screen, upon which data tells us a story about the real world that is just as enticing as the flickering shadows on the wall of Plato’s Cave, or the mesmerizing movies projected in Plato’s Cinema.

Data shapes our perception of the real world, but sometimes we forget that data is only a partial reflection of reality.

I am sure that it sounds silly to point out something so obvious, but imagine if, before you were freed, the other prisoners, in either the cave or the cinema, tried to convince you that the shadows or the movies weren’t real.  Or imagine you’re the prisoner returning to either the cave or the cinema.  How would you convince other prisoners that you’ve seen the true nature of reality?

A common question about Plato’s Cave is whether it’s crueler to show the prisoner the real world, or to return the prisoner to the cave after he has seen it.  Much like the illusions of the cave and the cinema, data makes more sense the more we believe it is real.

However, with data, neither breaking the illusion nor returning ourselves to it is cruel, but is instead a necessary practice because it’s important to occasionally remind ourselves that data and the real world are not the same thing.

Data Governance Frameworks are like Jigsaw Puzzles

Data Governance Jigsaw Puzzle.png

In a recent interview, Jill Dyché explained a common misconception, namely that a data governance framework is not a strategy.  “Unlike other strategic initiatives that involve IT,” Jill explained, “data governance needs to be designed.  The cultural factors, the workflow factors, the organizational structure, the ownership, the political factors, all need to be accounted for when you are designing a data governance roadmap.”

“People need a mental model, that is why everybody loves frameworks,” Jill continued.  “But they are not enough and I think the mistake that people make is that once they see a framework, rather than understanding its relevance to their organization, they will just adapt it and plaster it up on the whiteboard and show executives without any kind of context.  So they are already defeating the purpose of data governance, which is to make it work within the context of your business problems, not just have some kind of mental model that everybody can agree on, but is not really the basis for execution.”

“So it’s a really, really dangerous trend,” Jill cautioned, “that we see where people equate strategy with framework because strategy is really a series of collected actions that result in some execution — and that is exactly what data governance is.”

And in her excellent article Data Governance Next Practices: The 5 + 2 Model, Jill explained that data governance requires a deliberate design so that the entire organization can buy into a realistic execution plan, not just a sound bite.  As usual, I agree with Jill, since, in my experience, many people expect a data governance framework to provide eureka-like moments of insight.

In The Myths of Innovation, Scott Berkun debunked the myth of the eureka moment using the metaphor of a jigsaw puzzle.

“When you put the last piece into place, is there anything special about that last piece or what you were wearing when you put it in?” Berkun asked.  “The only reason that last piece is significant is because of the other pieces you’d already put into place.  If you jumbled up the pieces a second time, any one of them could turn out to be the last, magical piece.”

“The magic feeling at the moment of insight, when the last piece falls into place,” Berkun explained, “is the reward for many hours (or years) of investment coming together.  In comparison to the simple action of fitting the puzzle piece into place, we feel the larger collective payoff of hundreds of pieces’ worth of work.”

Perhaps the myth of the data governance framework could also be debunked using the metaphor of a jigsaw puzzle.

Data governance requires the coordination of a complex combination of a myriad of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data quality remediation, data stewardship, business process optimization, technology enablement, change management — and many other puzzle pieces.

How could a data governance framework possibly predict how you will assemble the puzzle pieces?  Or how the puzzle pieces will fit together within your unique corporate culture?  Or which of the many aspects of data governance will turn out to be the last (or even the first) piece of the puzzle to fall into place in your organization?  And, of course, there is truly no last piece of the puzzle, since data governance is an ongoing program because the business world constantly gets jumbled up by change.

So, data governance frameworks are useful, but only if you realize that data governance frameworks are like jigsaw puzzles.

Data Quality in Six Verbs

Once upon a time when asked on Twitter to identify a list of critical topics for data quality practitioners, my pithy (with only 140 characters in a tweet, pithy is as good as it gets) response was, and especially since I prefer emphasizing the need to take action, to propose six critical verbs: Investigate, Communicate, Collaborate, Remediate, Inebriate, and Reiterate.

Lest my pith be misunderstood aplenty, this blog post provides more detail, plus links to related posts, about what I meant.

1 — Investigate

Data quality is not exactly a riddle wrapped in a mystery inside an enigma.  However, understanding your data is essential to using it effectively and improving its quality.  Therefore, the first thing you must do is investigate.

So, grab your favorite (preferably highly caffeinated) beverage, get settled into your comfy chair, roll up your sleeves and starting analyzing that data.  Data profiling tools can be very helpful with raw data analysis.

However, data profiling is elementary, my dear reader.  In order for you to make sense of those data elements, you require business context.  This means you must also go talk with data’s best friends—its stewards, analysts, and subject matter experts.

Six blog posts related to Investigate:

2 — Communicate

After you have completed your preliminary investigation, the next thing you must do is communicate your findings, which helps improve everyone’s understanding of how data is being used, verify data’s business relevancy, and prioritize critical issues.

Keep in mind that communication is mostly about listening.  Also, be prepared to face “data denial” whenever data quality is discussed.  This is a natural self-defense mechanism for the people responsible for business processes, technology, and data, which is understandable because nobody likes to be blamed (or feel blamed) for causing or failing to fix data quality problems.

No matter how uncomfortable these discussions may be at times, they are essential to evaluating the potential ROI of data quality improvements, defining data quality standards, and most importantly, providing a working definition of success.

Six blog posts related to Communicate:

3 — Collaborate

After you have investigated and communicated, now you must rally the team that will work together to improve the quality of your data.  A cross-disciplinary team will be needed because data quality is neither a business nor a technical issue—it is both.

Therefore, you will need the collaborative effort of business and technical folks.  The business folks usually own the data, or at least the business processes that create it, so they understand its meaning and daily use.  The technical folks usually own the hardware and software comprising your data architecture.  Both sets of folks must realize they are all “one company folk” that must collaborate in order to be successful.

No, you don’t need a folk singer, but you may need an executive sponsor.  The need for collaboration might sound rather simple, but as one of my favorite folk singers taught me, sometimes the hardest thing to learn is the least complicated.

Six blog posts related to Collaborate:

4 — Remediate

Resolving data quality issues requires a combination of data cleansing and defect prevention.  Data cleansing is reactive and its common (and deserved) criticism is that it essentially treats the symptoms without curing the disease. 

Defect prevention is proactive and through root cause analysis and process improvements, it essentially is the cure for the quality ills that ail your data.  However, a data governance framework is often necessary for defect prevention to be successful.  As is patience and understanding since it will require a strategic organizational transformation that doesn’t happen overnight.

The unavoidable reality is that data cleansing is used to correct today’s problems while defect prevention is busy building a better tomorrow for your organization.  Fundamentally, data quality requires a hybrid discipline that combines data cleansing and defect prevention into an enterprise-wide best practice.

Six blog posts related to Remediate:

5 — Inebriate

I am not necessarily advocating that kind of inebriation.  Instead, think Emily Dickinson (i.e., “Inebriate of air am I” – it’s a line from a poem about happiness that, yes, also happens to make a good drinking song). 

My point is that you must not only celebrate your successes, but celebrate them quite publicly.  Channel yet another poet (Walt Whitman) and sound your barbaric yawp over the cubicles of your company: “We just improved the quality of our data!”

Of course, you will need to be more specific.  Declare success using words illustrating the business impact of your achievements, such as mitigated risks, reduced costs, or increased revenues — those three are always guaranteed executive crowd pleasers.

Six blog posts related to Inebriate:

6 — Reiterate

Like the legend of the phoenix, the end is also a new beginning.  Therefore, don’t get too inebriated, since you are not celebrating the end of your efforts.  Your data quality journey has only just begun.  Your continuous monitoring must continue and your ongoing improvements must remain ongoing.  Which is why, despite the tension this reality, and this bad grammatical pun, might cause you, always remember that the tense of all six of these verbs is future continuous.

Six blog posts related to Reiterate:

What Say You?

Please let me know what you think, pithy or otherwise, by posting a comment below.  And feel free to use more than six verbs.

Finding Data Quality

WindowsLiveWriter-FindingDataQuality_F0E9-

Have you ever experienced that sinking feeling, where you sense if you don’t find data quality, then data quality will find you?

In the spring of 2003, Pixar Animation Studios produced one of my all-time favorite Walt Disney Pictures—Finding Nemo

This blog post is an hommage to not only the film, but also to the critically important role into which data quality is cast within all of your enterprise information initiatives, including business intelligence, master data management, and data governance. 

I hope that you enjoy reading this blog post, but most important, I hope you always remember: “Data are friends, not food.”

Data Silos

WindowsLiveWriter-FindingDataQuality_F0E9-

“Mine!  Mine!  Mine!  Mine!  Mine!”

That’s the Data Silo Mantra—and it is also the bane of successful enterprise information management.  Many organizations persist on their reliance on vertical data silos, where each and every business unit acts as the custodian of their own private data—thereby maintaining their own version of the truth.

Impressive business growth can cause an organization to become a victim of its own success.  Significant collateral damage can be caused by this success, and most notably to the organization’s burgeoning information architecture.

Earlier in an organization’s history, it usually has fewer systems and easily manageable volumes of data, thereby making managing data quality and effectively delivering the critical information required to make informed business decisions everyday, a relatively easy task where technology can serve business needs well—especially when the business and its needs are small.

However, as the organization grows, it trades effectiveness for efficiency, prioritizing short-term tactics over long-term strategy, and by seeing power in the hoarding of data, not in the sharing of information, the organization chooses business unit autonomy over enterprise-wide collaboration—and without this collaboration, successful enterprise information management is impossible.

A data silo often merely represents a microcosm of an enterprise-wide problem—and this truth is neither convenient nor kind.

Data Profiling

WindowsLiveWriter-FindingDataQuality_F0E9-

“I see a light—I’m feeling good about my data . . . Good feeling’s gone—AHH!”

Although it’s not exactly a riddle wrapped in a mystery inside an enigma,  understanding your data is essential to using it effectively and improving its quality—to achieve these goals, there is simply no substitute for data analysis.

Data profiling can provide a reality check for the perceptions and assumptions you may have about the quality of your data.  A data profiling tool can help you by automating some of the grunt work needed to begin your analysis.

However, it is important to remember that the analysis itself can not be automated—you need to translate your analysis into the meaningful reports and questions that will facilitate more effective communication and help establish tangible business context.

Ultimately, I believe the goal of data profiling is not to find answers, but instead, to discover the right questions. 

Discovering the right questions requires talking with data’s best friends—its stewards, analysts, and subject matter experts.  These discussions are a critical prerequisite for determining data usage, standards, and the business relevant metrics for measuring and improving data quality.  Always remember that well performed data profiling is highly interactive and a very iterative process.

Defect Prevention

WindowsLiveWriter-FindingDataQuality_F0E9-

“You, Data-Dude, takin’ on the defects.

You’ve got serious data quality issues, dude.

Awesome.”

Even though it is impossible to truly prevent every problem before it happens, proactive defect prevention is a highly recommended data quality best practice because the more control enforced where data originates, the better the overall quality will be for enterprise information.

Although defect prevention is most commonly associated with business and technical process improvements, after identifying the burning root cause of your data defects, you may predictably need to apply some of the principles of behavioral data quality.

In other words, understanding the complex human dynamics often underlying data defects is necessary for developing far more effective tactics and strategies for implementing successful and sustainable data quality improvements.

Data Cleansing

WindowsLiveWriter-FindingDataQuality_F0E9-

“Just keep cleansing.  Just keep cleansing.

Just keep cleansing, cleansing, cleansing.

What do we do?  We cleanse, cleanse.”

That’s not the Data Cleansing Theme Song—but it can sometimes feel like it.  Especially whenever poor data quality negatively impacts decision-critical information, the organization may legitimately prioritize a reactive short-term response, where the only remediation will be fixing the immediate problems.

Balancing the demands of this data triage mentality with the best practice of implementing defect prevention wherever possible, will often create a very challenging situation for you to contend with on an almost daily basis.

Therefore, although comprehensive data remediation will require combining reactive and proactive approaches to data quality, you need to be willing and able to put data cleansing tools to good use whenever necessary.

Communication

WindowsLiveWriter-FindingDataQuality_F0E9-

“It’s like he’s trying to speak to me, I know it.

Look, you’re really cute, but I can’t understand what you’re saying.

Say that data quality thing again.”

I hear this kind of thing all the time (well, not the “you’re really cute” part).

Effective communication improves everyone’s understanding of data quality, establishes a tangible business context, and helps prioritize critical data issues. 

Keep in mind that communication is mostly about listening.  Also, be prepared to face “data denial” when data quality problems are discussed.  Most often, this is a natural self-defense mechanism for the people responsible for business processes, technology, and data—and because of the simple fact that nobody likes to feel blamed for causing or failing to fix the data quality problems.

The key to effective communication is clarity.  You should always make sure that all data quality concepts are clearly defined and in a language that everyone can understand.  I am not just talking about translating the techno-mumbojumbo, because even business-speak can sound more like business-babbling—and not just to the technical folks.

Additionally, don’t be afraid to ask questions or admit when you don’t know the answers.  Many costly mistakes can be made when people assume that others know (or pretend to know themselves) what key concepts and other terminology actually mean.

Never underestimate the potential negative impacts that the point of view paradox can have on communication.  For example, the perspectives of the business and technical stakeholders can often appear to be diametrically opposed.

Practicing effective communication requires shutting our mouth, opening our ears, and empathically listening to each other, instead of continuing to practice ineffective communication, where we merely take turns throwing word-darts at each other.

Collaboration

WindowsLiveWriter-FindingDataQuality_F0E9-

“Oh and one more thing:

When facing the daunting challenge of collaboration,

Work through it together, don't avoid it.

Come on, trust each other on this one.

Yes—trust—it’s what successful teams do.”

Most organizations suffer from a lack of collaboration, and as noted earlier, without true enterprise-wide collaboration, true success is impossible.

Beyond the data silo problem, the most common challenge for collaboration is the divide perceived to exist between the Business and IT, where the Business usually owns the data and understands its meaning and use in the day-to-day operation of the enterprise, and IT usually owns the hardware and software infrastructure of the enterprise’s technical architecture.

However, neither the Business nor IT alone has all of the necessary knowledge and resources required to truly be successful.  Data quality requires that the Business and IT forge an ongoing and iterative collaboration.

You must rally the team that will work together to improve the quality of your data.  A cross-disciplinary team will truly be necessary because data quality is neither a business issue nor a technical issue—it is both, truly making it an enterprise issue.

Executive sponsors, business and technical stakeholders, business analysts, data stewards, technology experts, and yes, even consultants and contractors—only when all of you are truly working together as a collaborative team, can the enterprise truly achieve great things, both tactically and strategically.

Successful enterprise information management is spelled E—A—C.

Of course, that stands for Enterprises—Always—Collaborate.  The EAC can be one seriously challenging place, dude.

You don’t know if you know what they know, or if they know what you know, but when you know, then they know, you know?

It’s like first you are all like “Whoa!” and they are all like “Whoaaa!” then you are like “Sweet!” and then they are like “Totally!”

This critical need for collaboration might seem rather obvious.  However, as all of the great philosophers have taught us, sometimes the hardest thing to learn is the least complicated.

Okay.  Squirt will now give you a rundown of the proper collaboration technique:

“Good afternoon. We’re gonna have a great collaboration today.

Okay, first crank a hard cutback as you hit the wall.

There’s a screaming bottom curve, so watch out.

Remember: rip it, roll it, and punch it.”

Finding Data Quality

WindowsLiveWriter-FindingDataQuality_F0E9-

As more and more organizations realize the critical importance of viewing data as a strategic corporate asset, data quality is becoming an increasingly prevalent topic of discussion.

However, and somewhat understandably, data quality is sometimes viewed as a small fish—albeit with a “lucky fin”—in a much larger pond.

In other words, data quality is often discussed only in its relation to enterprise information initiatives such as data integration, master data management, data warehousing, business intelligence, and data governance.

There is nothing wrong with this perspective, and as a data quality expert, I admit to my general tendency to see data quality in everything.  However, regardless of the perspective from which you begin your journey, I believe that eventually you will be Finding Data Quality wherever you look as well.