Finding Data Quality

WindowsLiveWriter-FindingDataQuality_F0E9-

Have you ever experienced that sinking feeling, where you sense if you don’t find data quality, then data quality will find you?

In the spring of 2003, Pixar Animation Studios produced one of my all-time favorite Walt Disney Pictures—Finding Nemo

This blog post is an hommage to not only the film, but also to the critically important role into which data quality is cast within all of your enterprise information initiatives, including business intelligence, master data management, and data governance. 

I hope that you enjoy reading this blog post, but most important, I hope you always remember: “Data are friends, not food.”

Data Silos

WindowsLiveWriter-FindingDataQuality_F0E9-

“Mine!  Mine!  Mine!  Mine!  Mine!”

That’s the Data Silo Mantra—and it is also the bane of successful enterprise information management.  Many organizations persist on their reliance on vertical data silos, where each and every business unit acts as the custodian of their own private data—thereby maintaining their own version of the truth.

Impressive business growth can cause an organization to become a victim of its own success.  Significant collateral damage can be caused by this success, and most notably to the organization’s burgeoning information architecture.

Earlier in an organization’s history, it usually has fewer systems and easily manageable volumes of data, thereby making managing data quality and effectively delivering the critical information required to make informed business decisions everyday, a relatively easy task where technology can serve business needs well—especially when the business and its needs are small.

However, as the organization grows, it trades effectiveness for efficiency, prioritizing short-term tactics over long-term strategy, and by seeing power in the hoarding of data, not in the sharing of information, the organization chooses business unit autonomy over enterprise-wide collaboration—and without this collaboration, successful enterprise information management is impossible.

A data silo often merely represents a microcosm of an enterprise-wide problem—and this truth is neither convenient nor kind.

Data Profiling

WindowsLiveWriter-FindingDataQuality_F0E9-

“I see a light—I’m feeling good about my data . . . Good feeling’s gone—AHH!”

Although it’s not exactly a riddle wrapped in a mystery inside an enigma,  understanding your data is essential to using it effectively and improving its quality—to achieve these goals, there is simply no substitute for data analysis.

Data profiling can provide a reality check for the perceptions and assumptions you may have about the quality of your data.  A data profiling tool can help you by automating some of the grunt work needed to begin your analysis.

However, it is important to remember that the analysis itself can not be automated—you need to translate your analysis into the meaningful reports and questions that will facilitate more effective communication and help establish tangible business context.

Ultimately, I believe the goal of data profiling is not to find answers, but instead, to discover the right questions. 

Discovering the right questions requires talking with data’s best friends—its stewards, analysts, and subject matter experts.  These discussions are a critical prerequisite for determining data usage, standards, and the business relevant metrics for measuring and improving data quality.  Always remember that well performed data profiling is highly interactive and a very iterative process.

Defect Prevention

WindowsLiveWriter-FindingDataQuality_F0E9-

“You, Data-Dude, takin’ on the defects.

You’ve got serious data quality issues, dude.

Awesome.”

Even though it is impossible to truly prevent every problem before it happens, proactive defect prevention is a highly recommended data quality best practice because the more control enforced where data originates, the better the overall quality will be for enterprise information.

Although defect prevention is most commonly associated with business and technical process improvements, after identifying the burning root cause of your data defects, you may predictably need to apply some of the principles of behavioral data quality.

In other words, understanding the complex human dynamics often underlying data defects is necessary for developing far more effective tactics and strategies for implementing successful and sustainable data quality improvements.

Data Cleansing

WindowsLiveWriter-FindingDataQuality_F0E9-

“Just keep cleansing.  Just keep cleansing.

Just keep cleansing, cleansing, cleansing.

What do we do?  We cleanse, cleanse.”

That’s not the Data Cleansing Theme Song—but it can sometimes feel like it.  Especially whenever poor data quality negatively impacts decision-critical information, the organization may legitimately prioritize a reactive short-term response, where the only remediation will be fixing the immediate problems.

Balancing the demands of this data triage mentality with the best practice of implementing defect prevention wherever possible, will often create a very challenging situation for you to contend with on an almost daily basis.

Therefore, although comprehensive data remediation will require combining reactive and proactive approaches to data quality, you need to be willing and able to put data cleansing tools to good use whenever necessary.

Communication

WindowsLiveWriter-FindingDataQuality_F0E9-

“It’s like he’s trying to speak to me, I know it.

Look, you’re really cute, but I can’t understand what you’re saying.

Say that data quality thing again.”

I hear this kind of thing all the time (well, not the “you’re really cute” part).

Effective communication improves everyone’s understanding of data quality, establishes a tangible business context, and helps prioritize critical data issues. 

Keep in mind that communication is mostly about listening.  Also, be prepared to face “data denial” when data quality problems are discussed.  Most often, this is a natural self-defense mechanism for the people responsible for business processes, technology, and data—and because of the simple fact that nobody likes to feel blamed for causing or failing to fix the data quality problems.

The key to effective communication is clarity.  You should always make sure that all data quality concepts are clearly defined and in a language that everyone can understand.  I am not just talking about translating the techno-mumbojumbo, because even business-speak can sound more like business-babbling—and not just to the technical folks.

Additionally, don’t be afraid to ask questions or admit when you don’t know the answers.  Many costly mistakes can be made when people assume that others know (or pretend to know themselves) what key concepts and other terminology actually mean.

Never underestimate the potential negative impacts that the point of view paradox can have on communication.  For example, the perspectives of the business and technical stakeholders can often appear to be diametrically opposed.

Practicing effective communication requires shutting our mouth, opening our ears, and empathically listening to each other, instead of continuing to practice ineffective communication, where we merely take turns throwing word-darts at each other.

Collaboration

WindowsLiveWriter-FindingDataQuality_F0E9-

“Oh and one more thing:

When facing the daunting challenge of collaboration,

Work through it together, don't avoid it.

Come on, trust each other on this one.

Yes—trust—it’s what successful teams do.”

Most organizations suffer from a lack of collaboration, and as noted earlier, without true enterprise-wide collaboration, true success is impossible.

Beyond the data silo problem, the most common challenge for collaboration is the divide perceived to exist between the Business and IT, where the Business usually owns the data and understands its meaning and use in the day-to-day operation of the enterprise, and IT usually owns the hardware and software infrastructure of the enterprise’s technical architecture.

However, neither the Business nor IT alone has all of the necessary knowledge and resources required to truly be successful.  Data quality requires that the Business and IT forge an ongoing and iterative collaboration.

You must rally the team that will work together to improve the quality of your data.  A cross-disciplinary team will truly be necessary because data quality is neither a business issue nor a technical issue—it is both, truly making it an enterprise issue.

Executive sponsors, business and technical stakeholders, business analysts, data stewards, technology experts, and yes, even consultants and contractors—only when all of you are truly working together as a collaborative team, can the enterprise truly achieve great things, both tactically and strategically.

Successful enterprise information management is spelled E—A—C.

Of course, that stands for Enterprises—Always—Collaborate.  The EAC can be one seriously challenging place, dude.

You don’t know if you know what they know, or if they know what you know, but when you know, then they know, you know?

It’s like first you are all like “Whoa!” and they are all like “Whoaaa!” then you are like “Sweet!” and then they are like “Totally!”

This critical need for collaboration might seem rather obvious.  However, as all of the great philosophers have taught us, sometimes the hardest thing to learn is the least complicated.

Okay.  Squirt will now give you a rundown of the proper collaboration technique:

“Good afternoon. We’re gonna have a great collaboration today.

Okay, first crank a hard cutback as you hit the wall.

There’s a screaming bottom curve, so watch out.

Remember: rip it, roll it, and punch it.”

Finding Data Quality

WindowsLiveWriter-FindingDataQuality_F0E9-

As more and more organizations realize the critical importance of viewing data as a strategic corporate asset, data quality is becoming an increasingly prevalent topic of discussion.

However, and somewhat understandably, data quality is sometimes viewed as a small fish—albeit with a “lucky fin”—in a much larger pond.

In other words, data quality is often discussed only in its relation to enterprise information initiatives such as data integration, master data management, data warehousing, business intelligence, and data governance.

There is nothing wrong with this perspective, and as a data quality expert, I admit to my general tendency to see data quality in everything.  However, regardless of the perspective from which you begin your journey, I believe that eventually you will be Finding Data Quality wherever you look as well.

 

What Does Data Quality Technology Want?

During a recent Radiolab podcast, Kevin Kelly, author of the book What Technology Wants, used the analogy of how a flower leans toward sunlight because it “wants” the sunlight, to describe what the interweaving web of evolving technical innovations (what he refers to as the super-organism of technology) is leaning toward—in other words, what technology wants.

The other Radiolab guest was Steven Johnson, author of the book Where Good Ideas Come From, who somewhat dispelled the traditional notion of the eureka effect by explaining that the evolution of ideas, like all evolution, stumbles its way toward the next good idea, which inevitably leads to a significant breakthrough, such as what happens with innovations in technology.

Listening to this thought-provoking podcast made me ponder the question: What does data quality technology want?

In a previous post, I used the term OOBE-DQ to refer to the out-of-box-experience (OOBE) provided by data quality (DQ) tools, which usually becomes a debate between “ease of use” and “powerful functionality” after you ignore the Magic Beans sales pitch that guarantees you the data quality tool is both remarkably easy to use and incredibly powerful.

The data quality market continues to evolve away from esoteric technical tools and stumble its way toward the next good idea, which is business-empowering suites providing robust functionality with increasingly role-based user interfaces, which are tailored to the specific needs of different users.  Of course, many vendors would love to claim sole responsibility for what they would call significant innovations in data quality technology, instead of what are simply by-products of an evolving market.

The deployment of data quality functionality within and across organizations also continues to evolve, as data cleansing activities are being complemented by real-time defect prevention services used to greatly minimize poor data quality at the multiple points of origin within the enterprise data ecosystem.

However, viewpoints about the role of data quality technology generally remain split between two opposing perspectives:

  1. Technology enables a data quality process, but doesn’t obviate the need for people (e.g., data stewards) to remain actively involved and be held accountable for maintaining the quality of data.
  2. Technology automates a data quality process, and a well-designed and properly implemented technical solution obviates the need for people to be actively involved after its implementation.

Do you think that continuing advancements and innovations in data quality technology will obviate the need for people to be actively involved in data quality processes?  In the future, will we have high quality data because our technology essentially wants it and therefore leans our organizations toward high quality data?  Let’s conduct another unscientific data quality poll:

 

Additionally, please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

 

Related Posts

DQ-Tip: “Data quality tools do not solve data quality problems...”

Which came first, the Data Quality Tool or the Business Need?

Data Quality Industry: Problem Solvers or Enablers?

Data Quality Magic

The Tooth Fairy of Data Quality

Data Quality is not a Magic Trick

Do you believe in Magic (Quadrants)?

Pirates of the Computer: The Curse of the Poor Data Quality

The Good Data

Photo via Flickr (Creative Commons License) by: Philip Fibiger

When I was growing up, my family had a cabinet filled with “the good dishes” that were reserved for use on special occasions, i.e., the plates, bowls, and cups that would only be used for holiday dinners like Thanksgiving or Christmas.  The rest of the year, we used “the everyday dishes” that were a random collection of various sets of dishes collected over the years. 

Meals using the everyday dishes would seldom have matching plates, bowls, and cups, and if these dishes had a pattern on them once, it was mostly, if not completely, worn down by repeated use and constant washing.  Whenever we actually got to use the good dishes, it made the meal seem more special, more fancy, perhaps it even made the food seem like it tasted a little bit better.

Some organizations have a database filled with “the good data” that are reserved for special occasions.  In other words, the data prepared for specific business uses such as regulatory compliance and reporting.  Meanwhile, the rest of the time, and perhaps in support of daily operations, the organization uses “the everyday data” that is often a random collection of various data sets.

Business activities using the everyday data would seldom use a single source, but instead mash-up data from several sources, perhaps even storing the results in a spreadsheet or a private database—otherwise known by the more nefarious term: data silo.

Most of the time, when organizations discuss their enterprise data management strategy, they focus on building and maintaining the good data.  However, unlike the good dishes, the organization tries to force everyone to use the good data even for everyday business activities, and essentially force the organization to throw away the everyday data—to eliminate all those data silos.

But there is a time and a place for both the good dishes and the everyday dishes, as well as paper plates and plastic cups.  And yes, even eating with your hands has a time and a place, too.

The same is true for data.  Yes, you should build and maintain the good data to be used to support as many business activities as possible.  And yes, you should minimize the special occasions where customized data and/or data silos are truly necessary.

But you should also accept that since there is so much data available to the enterprise, and so many business uses for it, that forcing everyone to use only the good data might be preventing your organization from maximizing the full potential of its data.

 

Related Posts

To Our Data Perfectionists

DQ-View: From Data to Decision

The Data-Decision Symphony

Is your data complete and accurate, but useless to your business?

You Can’t Always Get the Data You Want

Data Governance and the Social Enterprise

In his blog post Socializing Software, Michael Fauscette explained that in order “to create a next generation enterprise, businesses need to take two concepts from the social web and apply them across all business functions: community and content.”

“Traditional enterprise software,” according to Fauscette, “was built on the concept of managing through rigid business processes and controlled workflow.  With process at the center of the design, people-based collaboration was not possible.”

Peter Sondergaard, the global head of research at Gartner, explained at a recent conference that “the rigid business processes which dominate enterprise organizational architectures today are well suited for routine, predictable business activities.  But they are poorly suited to support people who’s jobs require discovery, interpretation, negotiation and complex decision-making.”

“Social computing,” according to Sondergaard, “not Facebook, or Twitter, or LinkedIn, but the technologies and principals behind them will be implemented across and between all organizations, and it will unleash yet to be realized productivity growth.”

Since the importance of collaboration is one of my favorite topics, I like Fauscette’s emphasis on people-based collaboration and Sondergaard’s emphasis on the limitations of process-based collaboration.  The key to success for most, if not all, organizational initiatives is the willingness of people all across the enterprise to embrace collaboration.

Successful organizations view collaboration not just as a guiding principle, but as a call to action in their daily business practices.

As Sondergaard points out, the technologies and principals behind social computing are the key to enabling what many analysts have begun referring to as the social enterprise.  Collaboration is the key to business success.  This essential collaboration has to be based on people, and not on rigid business processes since business activities and business priorities are constantly changing.

 

Data Governance and the Social Enterprise

Often the root cause of poor data quality can be traced to a lack of a shared understanding of the roles and responsibilities involved in how the organization is using its data to support its business activities.  The primary focus of data governance is the strategic alignment of people throughout the organization through the definition, implementation, and enforcement of the policies that govern the interactions between people, business processes, data, and technology.

A data quality program within a data governance framework is a cross-functional, enterprise-wide initiative requiring people to be accountable for its data, business process, and technology aspects.  However, policy enforcement and accountability are often confused with traditional notions of command and control, which is the antithesis of the social enterprise that instead requires an emphasis on communication, cooperation, and people-based collaboration.

Data governance policies for data quality illustrate the intersection of business, data, and technical knowledge, which is spread throughout the enterprise, transcending any artificial boundaries imposed by an organizational chart or rigid business processes, where different departments or different business functions appear as if they were independent of the rest of the organization.

Data governance reveals how interconnected and interdependent the organization is, and why people-driven social enterprises are more likely to survive and thrive in today’s highly competitive and rapidly evolving marketplace.

Social enterprises rely on the strength of their people asset to successfully manage their data, which is a strategic corporate asset because high quality data serves as a solid foundation for an organization’s success, empowering people, enabled by technology, to optimize business processes for superior business performance.

 

Related Posts

Podcast: Data Governance is Mission Possible

Trust is not a checklist

The Business versus IT—Tear down this wall!

The Road of Collaboration

Shared Responsibility

Enterprise Ubuntu

Data Transcendentalism

Social Karma

The Data Outhouse

This is a screen capture of the results of last week’s unscientific data quality poll where it was noted that in many organizations a data warehouse is the only system where data from numerous and disparate operational sources has been integrated into a single system of record containing fully integrated and historical data.  Although the rallying cry and promise of the data warehouse has long been that it will serve as the source for most of the enterprise’s reporting and decision support needs, many simply get ignored by the organization, which continues to rely on its data silos and spreadsheets for reporting and decision making.

Based on my personal experience, the most common reason is that these big boxes of data are often built with little focus on the quality of the data being delivered.  However, since that’s just my opinion, I launched the poll and invited your comments.

 

Commendable Comments

Stephen Putman commented that data warehousing “projects are usually so large that if you approach them in a big-bang, OLTP management fashion, the foundational requirements of the thing change between inception and delivery.”

“I’ve seen very few data warehouses live up to the dream,” Dylan Jones commented.  “I’ve always found that silos still persisted after a warehouse introduction because the turnaround on adding new dimensions and reports to the warehouse/mart meant that the business users simply had no option.  I think data quality obviously plays a part.  The business side only need to be burnt once or twice before they lose faith.  That said, a data warehouse is one of the best enablers of data quality motivation, so without them a lot of projects simply wouldn’t get off the ground.”

“I just voted Outhouse too,” commented Paul Drenth, “because I agree with Dylan that the business side keeps using other systems out of disappointment in the trustworthiness of the data warehouse.  I agree that bad data quality plays a role in that, but more often it’s also a lack of discipline in the organization which causes a downward spiral of missing information, and thus deciding to keep other information in a separate or local system.  So I think usability of data warehouse systems still needs to be improved significantly, also by adding invisible or automatic data quality assurance, the business might gain more trust.”

“Great point Paul, useful addition,” Dylan responded.  “I think discipline is a really important aspect, this ties in with change management.  A lot of business people simply don’t see the sense of urgency for moving their reports to a warehouse so lack the discipline to follow the procedures.  Or we make the procedures too inflexible.  On one site I noticed that whenever the business wanted to add a new dimension or category it would take a 2-3 week turnaround to sign off.  For a financial services company this was a killer because they had simply been used to dragging another column into their Excel spreadsheets, instantly getting the data they needed.  If we’re getting into information quality for a second, then the dimension of presentation quality and accessibility become far more important than things like accuracy and completeness.  Sure a warehouse may be able to show you data going back 15 years and cross validates results with surrogate sources to confirm accuracy, but if the business can’t get it in a format they need, then it’s all irrelevant.”

“I voted Data Warehouse,” commented Jarrett Goldfedder, “but this is marked with an asterisk.  I would say that 99% of the time, a data warehouse becomes an outhouse, crammed with data that serves no purpose.  I think terminology is important here, though.  In my previous organization, we called the Data Warehouse the graveyard and the people who did the analytics were the morticians.  And actually, that’s not too much of a stretch considering our job was to do CSI-type investigations and autopsies on records that didn’t fit with the upstream information.  This did not happen often, but when it did, we were quite grateful for having historical records maintained.  IMHO, if the records can trace back to the existing data and will save the organization money in the long-run, then the warehouse has served its purpose.”

“I’m having a difficult time deciding,” Corinna Martinez commented, “since most of the ones I have seen are high quality data, but not enough of it and therefore are considered Data Outhouses.  You may want to include some variation in your survey that covers good data but not enough; and bad data but lots to shift through in order to find something.”

“I too have voted Outhouse,” Simon Daniels commented, “and have also seen beautifully designed, PhD-worthy data warehouse implementations that are fundamentally of no practical use.  Part of the reason for this I think, particularly from a marketing point-of-view, which is my angle, is that how the data will be used is not sufficiently thought through.  In seeking to create marketing selections, segmentation and analytics, how will the insight locked-up in the warehouse be accessed within the context of campaign execution and subsequent response analysis?  Often sitting in splendid isolation, the data warehouse doesn’t offer the accessibility needed in day-to-day activities.”

Thanks to everyone who voted and special thanks to everyone who commented.  As always, your feedback is greatly appreciated.

 

Can MDM and Data Governance save the Data Warehouse?

During last week’s Informatica MDM Tweet Jam, Dan Power explained that master data management (MDM) can deliver to the business “a golden copy of the data that they can trust” and I remarked how companies expected that from their data warehouse.

“Most companies had unrealistic expectations from data warehouses,” Power responded, “which ended up being expensive, read-only, and updated infrequently.  MDM gives them the capability to modify the data, publish to a data warehouse, and manage complex hierarchies.  I think MDM offers more flexibility than the typical data warehouse.  That’s why business intelligence (BI) on top of MDM (or more likely, BI on top of a data warehouse that draws data from MDM) is so popular.”

As a follow-up question, I asked if MDM should be viewed as a complement or a replacement for the data warehouse.  “Definitely a complement,” Power responded. “MDM fills a void in the middle between transactional systems and the data warehouse, and does things that neither can do to data.”

In his recent blog post How to Keep the Enterprise Data Warehouse Relevant, Winston Chen explains that the data quality deficiencies of most data warehouses could be aided by MDM and data governance, which “can define and enforce data policies for quality across the data landscape.”  Chen believes that the data warehouse “is in a great position to be the poster child for data governance, and in doing so, it can keep its status as the center of gravity for all things data in an enterprise.”

I agree with Power that MDM can complement the data warehouse, and I agree with Chen that data governance can make the data warehouse (as well as many other things) better.  So perhaps MDM and data governance can save the data warehouse.

However, I must admit that I remain somewhat skeptical.  The same challenges that have caused most data warehouses to become data outhouses are also fundamental threats to the success of MDM and data governance.

 

Thinking outside the house

Just like real outhouses were eventually obsolesced by indoor plumbing, I wonder if data outhouses will eventually be obsolesced, perhaps ironically by emerging trends of outdoor plumbing, i.e., open source, cloud computing, and software as a service (SaaS).

Many industry analysts are also advocating the evolution of data as a service (DaaS), where data is taken out of all of its houses, meaning that the answer to my poll question might be neither data warehouse nor data outhouse.

Although none of these trends obviate the need for data quality nor alleviate the other significant challenges mentioned above, perhaps when it comes to data, we need to start thinking outside the house.

 

Related Posts

DQ-Poll: Data Warehouse or Data Outhouse?

Podcast: Data Governance is Mission Possible

Once Upon a Time in the Data

The Idea of Order in Data

Fantasy League Data Quality

Which came first, the Data Quality Tool or the Business Need?

Finding Data Quality

The Circle of Quality

Data Quality Industry: Problem Solvers or Enablers?

This morning I had the following Twitter conversation with Andy Bitterer of Gartner Research and ANALYSTerical, sparked by my previous post about Data Quality Magic, the one and only source of which I posited comes from the people involved:

 

What Say You?

Although Andy and I were just joking around, there is some truth beneath these tweets.  After all, according to Gartner research, “the market for data quality tools was worth approximately $727 million in software-related revenue as of the end of 2009, and is forecast to experience a compound annual growth rate (CAGR) of 12% during the next five years.” 

So I thought I would open this up to a good-natured debate. 

Do you think the data quality industry (software vendors, consultants, analysts, and conferences) is working harder to solve the problem of poor data quality or perpetuate the profitability of its continued existence?

All perspectives on this debate are welcome without bias.  Therefore, please post a comment below.

(Please Note: Comments advertising your products and services (or bashing your competitors) will NOT be approved.)

 

Related Posts

Which came first, the Data Quality Tool or the Business Need?

Do you believe in Magic (Quadrants)?

Can Enterprise-Class Solutions Ever Deliver ROI?

Promoting Poor Data Quality

The Once and Future Data Quality Expert

Imagining the Future of Data Quality

Data Quality is not an Act, it is a Habit

The Second Law of Data Quality states that it is not a one-time project, but a sustained program.  Or to paraphrase Aristotle:

“Data Quality is not an Act, it is a Habit.”

Habits are learned behaviors, which can become automatic after enough repetition.  Habits can also be either good or bad.

Sometimes we can become so focused on developing new good habits that we forget about our current good habits.  Other times we can become so focused on eliminating all of our bad habits that we lose ourselves in the quest for perfection.

This is why Aristotle was also an advocate of the Golden Mean, which is usually simplified into the sage advice:

“Moderation in all things.”

While helping our organization develop good habits for ensuring high quality data, we often use the term Best Practice.

Although data quality is a practice, it’s one we get better at as long as we continue practicing.  Quite often I have observed the bad habit of establishing, but never revisiting, best practices.

However, as our organization, and the business uses for our data, continues to evolve, so must our data quality practice.

Therefore, data quality is not an act, but it’s also not a best practice.  It’s a habit of continuous practice, continuous improvement, continuous learning, and continuous adaptation to continuous change—which is truly the best possible habit we can develop.

Data Quality is a Best Habit.

To Our Data Perfectionists

Had our organization but money enough, and time,
This demand for Data Perfection would be no crime.

We would sit down and think deep thoughts about all the wonderful ways,
To best model our data and processes, as slowly passes our endless days.
Freed from the Herculean Labors of Data Cleansing, we would sing the rhyme:
“The data will always be entered right, the first time, every time.”

We being exclusively Defect Prevention inclined,
Would only rubies within our perfected data find.
Executive Management would patiently wait for data that’s accurate and complete,
Since with infinite wealth and time, they would never fear the balance sheet.

Our vegetable enterprise data architecture would grow,
Vaster than empires, and more slow.

One hundred years would be spent lavishing deserved praise,
On our brilliant data model, upon which, with wonder, all would gaze.
Two hundred years to adore each and every defect prevention test,
But thirty thousand years to praise Juran, Deming, English, Kaizen, Six Sigma, and all the rest.
An age at least to praise every part of our flawless data quality methodology,
And the last age we would use to write our self-aggrandizing autobiography.

For our Corporate Data Asset deserves this Perfect State,
And we would never dare to love our data at any lower rate.

But at my back I always hear,
Time’s winged chariot hurrying near.

And if we do not address the immediate business needs,
Ignored by us while we were lost down in the data weeds.
Our beautiful enterprise data architecture shall no more be found,
After our Data Perfectionists’ long delay has run our company into the ground.

Because building a better tomorrow at the expense of ignoring today,
Has even with our very best of intentions, caused us to lose our way.
And all our quaint best practices will have turned to dust,
As burnt into ashes will be all of our business users’ trust.

Now, it is true that Zero Defects is a fine and noble goal,
For Manufacturing Quality—YES, but for Data Quality—NO.

We must aspire to a more practical approach, providing a critical business problem solving service,
Improving data quality, not for the sake of our data, but for the fitness of its business purpose.
Instead of focusing on only the bad we have done, forcing us to wear The Scarlet DQ Letter,
Let us focus on the good we are already doing, so from it we can learn how to do even better.

And especially now, while our enterprise-wide collaboration conspires,
To help us grow our Data Governance Maturity beyond just fighting fires.
Therefore, let us implement Defect Prevention wherever and whenever we can,
But also accept that Data Cleansing will always be an essential part of our plan.

Before our organization’s limited money and time are devoured,
Let us make sure that our critical business decisions are empowered.

Let us also realize that since change is the only universal constant,
Real best practices are not cast in stone, but written on parchment.
Because the business uses for our data, as well as our business itself, continues to evolve,
Our data strategy must be adaptation, allowing our dynamic business problems to be solved.

Thus, although it is true that we can never achieve Data Perfection,
We can deliver Business Insight, which always is our true direction.

___________________________________________________________________________________________________________________

This blog post was inspired by the poem To His Coy Mistress by Andrew Marvell.

 

Related Posts

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

Data and its Relationships with Quality

Data Quality and Miracle Exceptions

Data Quality and Chicken Little Syndrome

Data Quality: Quo Vadimus?

Data Myopia and Business Relativity

Plato’s Data

What going to the dentist taught me about data quality

How Data Cleansing Saves Lives

A Tale of Two Q’s

Data Quality and The Middle Way

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • Redefining Data Quality — Guest Peter Perera discusses his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

DQ-Tip: “There is no such thing as data accuracy...”

Data Quality (DQ) Tips is an OCDQ regular segment.  Each DQ-Tip is a clear and concise data quality pearl of wisdom.

“There is no such thing as data accuracy — There are only assertions of data accuracy.”

This DQ-Tip came from the Data Quality Pro webinar ISO 8000 Master Data Quality featuring Peter Benson of ECCMA.

You can download (.pdf file) quotes from this webinar by clicking on this link: Data Quality Pro Webinar Quotes - Peter Benson

ISO 8000 is the international standards for data quality.  You can get more information by clicking on this link: ISO 8000

 

Data Accuracy

Accuracy, which, thanks to substantial assistance from my readers, was defined in a previous post as both the correctness of a data value within a limited context such as verification by an authoritative reference (i.e., validity) combined with the correctness of a valid data value within an extensive context including other data as well as business processes (i.e., accuracy).

“The definition of data quality,” according to Peter and the ISO 8000 standards, “is the ability of the data to meet requirements.”

Although accuracy is only one of many dimensions of data quality, whenever we refer to data as accurate, we are referring to the ability of the data to meet specific requirements, and quite often it’s the ability to support making a critical business decision.

I agree with Peter and the ISO 8000 standards because we can’t simply take an accuracy metric on a data quality dashboard (or however else the assertion is presented to us) at face value without understanding how the metric is both defined and measured.

However, even when well defined and properly measured, data accuracy is still only an assertion.  Oftentimes, the only way to verify the assertion is by putting the data to its intended use.

If by using it you discover that the data is inaccurate, then by having established what the assertion of accuracy was based on, you have a head start on performing root cause analysis, enabling faster resolution of the issues—not only with the data, but also with the business and technical processes used to define and measure data accuracy.

 

Related Posts

Worthy Data Quality Whitepapers (Part 1)

Why isn’t our data quality worse?

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Data Quality and the Cupertino Effect

DQ-Tip: “Data quality is primarily about context not accuracy...”

DQ-Tip: “There is no point in monitoring data quality...”

DQ-Tip: “Don't pass bad data on to the next person...”

DQ-Tip: “...Go talk with the people using the data”

DQ-Tip: “Data quality is about more than just improving your data...” 

DQ-Tip: “Start where you are...”

Why isn’t our data quality worse?

In psychology, the term negativity bias is used to explain how bad evokes a stronger reaction than good in the human mind.  Don’t believe that theory?  Compare receiving an insult with receiving a compliment—which one do you remember more often?

Now, this doesn’t mean the dark side of the Force is stronger, it simply means that we all have a natural tendency to focus more on the negative aspects, rather than on the positive aspects, of most situations, including data quality.

In the aftermath of poor data quality negatively impacting decision-critical enterprise information, the natural tendency is for a data quality initiative to begin by focusing on the now painfully obvious need for improvement, essentially asking the question:

Why isn’t our data quality better?

Although this type of question is a common reaction to failure, it is also indicative of the problem-seeking mindset caused by our negativity bias.  However, Chip and Dan Heath, authors of the great book Switch, explain that even in failure, there are flashes of success, and following these “bright spots” can illuminate a road map for action, encouraging a solution-seeking mindset.

“To pursue bright spots is to ask the question:

What’s working, and how can we do more of it?

Sounds simple, doesn’t it? 

Yet, in the real-world, this obvious question is almost never asked.

Instead, the question we ask is more problem focused:

What’s broken, and how do we fix it?”

 

Why isn’t our data quality worse?

For example, let’s pretend that a data quality assessment is performed on a data source used to make critical business decisions.  With the help of business analysts and subject matter experts, it’s verified that this critical source has an 80% data accuracy rate.

The common approach is to ask the following questions (using a problem-seeking mindset):

  • Why isn’t our data quality better?
  • What is the root cause of the 20% inaccurate data?
  • What process (business or technical, or both) is broken, and how do we fix it?
  • What people are responsible, and how do we correct their bad behavior?

But why don’t we ask the following questions (using a solution-seeking mindset):

  • Why isn’t our data quality worse?
  • What is the root cause of the 80% accurate data?
  • What process (business or technical, or both) is working, and how do we re-use it?
  • What people are responsible, and how do we encourage their good behavior?

I am not suggesting that we abandon the first set of questions, especially since there are times when a problem-seeking mindset might be a better approach (after all, it does also incorporate a solution-seeking mindset—albeit after a problem is identified).

I am simply wondering why we often never even consider asking the second set of questions?

Most data quality initiatives focus on developing new solutions—and not re-using existing solutions.

Most data quality initiatives focus on creating new best practices—and not leveraging existing best practices.

Perhaps you can be the chosen one who will bring balance to the data quality initiative by asking both questions:

Why isn’t our data quality better?  Why isn’t our data quality worse?

Video: Oh, the Data You’ll Show!

In May, I wrote a Dr. Seuss style blog post called Oh, the Data You’ll Show! inspired by the great book Oh, the Places You'll Go!

In the following video, I have recorded my narration of the presentation format of my original blog post.  Enjoy!

 

Oh, the Data You’ll Show!

 

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: Oh, the Data You’ll Show!

And you can download the presentation (PDF file) used in the video by clicking on this link: Oh, the Data You’ll Show! (Slides)

And you can listen to and/or download the podcast (MP3 file) by clicking on this link: Oh, the Data You’ll Show! (Podcast)

“Some is not a number and soon is not a time”

In a true story that I recently read in the book Switch: How to Change Things When Change Is Hard by Chip and Dan Heath, back in 2004, Donald Berwick, a doctor and the CEO of the Institute for Healthcare Improvement, had some ideas about how to reduce the defect rate in healthcare, which, unlike the vast majority of data defects, was resulting in unnecessary patient deaths.

One common defect was deaths caused by medication mistakes, such as post-surgical patients failing to receive their antibiotics in the specified time, and another common defect was mismanaging patients on ventilators, resulting in death from pneumonia.

Although Berwick initially laid out a great plan for taking action, which proposed very specific process improvements, and was supported by essentially indisputable research, few changes were actually being implemented.  After all, his small, not-for-profit organization had only 75 employees, and had no ability whatsoever to force any changes on the healthcare industry.

So, what did Berwick do?  On December 14, 2004, in a speech that he delivered to a room full of hospital administrators at a major healthcare industry conference, he declared:

“Here is what I think we should do.  I think we should save 100,000 lives.

And I think we should do that by June 14, 2006—18 months from today.

Some is not a number and soon is not a time.

Here’s the number: 100,000.

Here’s the time: June 14, 2006—9 a.m.”

The crowd was astonished.  The goal was daunting.  Of course, all the hospital administrators agreed with the goal to save lives, but for a hospital to reduce its defect rate, it has to first acknowledge having a defect rate.  In other words, it has to admit that some patients are dying needless deaths.  And, of course, the hospital lawyers are not keen to put this admission on the record.

 

Data Denial

Whenever an organization’s data quality problems are discussed, it is very common to encounter data denial.  Most often, this is a natural self-defense mechanism for the people responsible for business processes, technology, and data—and understandable because of the simple fact that nobody likes to be blamed (or feel blamed) for causing or failing to fix the data quality problems.

But data denial can also doom a data quality improvement initiative from the very beginning.  Of course, everyone will agree that ensuring high quality data is being used to make critical daily business decisions is vitally important to corporate success, but for an organization to reduce its data defects, it has to first acknowledge having data defects.

In other words, the organization has to admit that some business decisions are mistakes being made based on poor quality data.

 

Half Measures

In his excellent recent blog post Half Measures, Phil Simon discussed the compromises often made during data quality initiatives, half measures such as “cleaning up some of the data, postponing parts of the data cleanup efforts, and taking a wait and see approach as more issues are unearthed.”

Although, as Phil explained, it is understandable that different individuals and factions within large organizations will have vested interests in taking action, just as others are biased towards maintaining the status quo, “don’t wait for the perfect time to cleanse your data—there isn’t any.  Find a good time and do what you can.”

 

Remarkable Data Quality

As Seth Godin explained in his remarkable book Purple Cow: Transform Your Business by Being Remarkable, the opposite of remarkable is not bad or mediocre or poorly done.  The opposite of remarkable is very good.

In other words, you must first accept that your organization has data defects, but most important, since some is not a number and soon is not a time, you must set specific data quality goals and specific times when you will meet (or exceed) your goals.

So, what happened with Berwick’s goal?  Eighteen months later, at the exact moment he’d promised to return—June 14, 2006, at 9 a.m.—Berwick took the stage again at the same major healthcare industry conference, and announced the results:

“Hospitals enrolled in the 100,000 Lives Campaign have collectively prevented an estimated 122,300 avoidable deaths and, as importantly, have begun to institutionalize new standards of care that will continue to save lives and improve health outcomes into the future.”

Although improving your organization’s data quality—unlike reducing defect rates in healthcare—isn’t a matter of life and death, remarkable data quality is becoming a matter of corporate survival in today’s highly competitive and rapidly evolving world.

Perfect data quality is impossible—but remarkable data quality is not.  Be remarkable.

The Real Data Value is Business Insight

Data Values for COUNTRY Understanding your data usage is essential to improving its quality, and therefore, you must perform data analysis on a regular basis.

A data profiling tool can help you by automating some of the grunt work needed to begin your data analysis, such as generating levels of statistical summaries supported by drill-down details, including data value frequency distributions (like the ones shown to the left).

However, a common mistake is to hyper-focus on the data values.

Narrowing your focus to the values of individual fields is a mistake when it causes you to lose sight of the wider context of the data, which can cause other errors like mistaking validity for accuracy.

Understanding data usage is about analyzing its most important context—how your data is being used to make business decisions.

 

“Begin with the decision in mind”

In his excellent recent blog post It’s time to industrialize analytics, James Taylor wrote that “organizations need to be much more focused on directing analysts towards business problems.”  Although Taylor was writing about how, in advanced analytics (e.g., data mining, predictive analytics), “there is a tendency to let analysts explore the data, see what can be discovered,” I think this tendency is applicable to all data analysis, including less advanced analytics like data profiling and data quality assessments.

Please don’t misunderstand—Taylor and I are not saying that there is no value in data exploration, because, without question, it can definitely lead to meaningful discoveries.  And I continue to advocate that the goal of data profiling is not to find answers, but instead, to discover the right questions.

However, as Taylor explained, it is because “the only results that matter are business results” that data analysis should always “begin with the decision in mind.  Find the decisions that are going to make a difference to business results—to the metrics that drive the organization.  Then ask the analysts to look into those decisions and see what they might be able to predict that would help make better decisions.”

Once again, although Taylor is discussing predictive analytics, this cogent advice should guide all of your data analysis.

 

The Real Data Value is Business Insight

The Real Data Value is Business Insight

Returning to data quality assessments, which create and monitor metrics based on summary statistics provided by data profiling tools (like the ones shown in the mockup to the left), elevating what are low-level technical metrics up to the level of business relevance will often establish their correlation with business performance, but will not establish metrics that drive—or should drive—the organization.

Although built from the bottom-up by using, for the most part, the data value frequency distributions, these metrics lose sight of the top-down fact that business insight is where the real data value lies.

However, data quality metrics such as completeness, validity, accuracy, and uniqueness, which are just a few common examples, should definitely be created and monitored—unfortunately, a single straightforward metric called Business Insight doesn’t exist.

But let’s pretend that my other mockup metrics were real—50% of the data is inaccurate and there is an 11% duplicate rate.

Oh, no!  The organization must be teetering on the edge of oblivion, right?  Well, 50% accuracy does sound really bad, basically like your data’s accuracy is no better than flipping a coin.  However, which data is inaccurate, and far more important, is the inaccurate data actually being used to make a business decision?

As for the duplicate rate, I am often surprised by the visceral reaction it can trigger, such as: “how can we possibly claim to truly understand who our most valuable customers are if we have an 11% duplicate rate?”

So, would reducing your duplicate rate to only 1% automatically result in better customer insight?  Or would it simply mean that the data matching criteria was too conservative (e.g., requiring an exact match on all “critical” data fields), preventing you from discovering how many duplicate customers you have?  (Or maybe the 11% indicates the matching criteria was too aggressive).

My point is that accuracy and duplicate rates are just numbers—what determines if they are a good number or a bad number?

The fundamental question that every data quality metric you create must answer is: How does this provide business insight?

If a data quality (or any other data) metric can not answer this question, then it is meaningless.  Meaningful metrics always represent business insight because they were created by beginning with the business decisions in mind.  Otherwise, your metrics could provide the comforting, but false, impression that all is well, or you could raise red flags that are really red herrings.

Instead of beginning data analysis with the business decisions in mind, many organizations begin with only the data in mind, which results in creating and monitoring data quality metrics that provide little, if any, business insight and decision support.

Although analyzing your data values is important, you must always remember that the real data value is business insight.

 

Related Posts

The First Law of Data Quality

Adventures in Data Profiling

Data Quality and the Cupertino Effect

Is your data complete and accurate, but useless to your business?

The Idea of Order in Data

You Can’t Always Get the Data You Want

Red Flag or Red Herring? 

DQ-Tip: “There is no point in monitoring data quality…”

Which came first, the Data Quality Tool or the Business Need?

Selling the Business Benefits of Data Quality

Which came first, the Data Quality Tool or the Business Need?

This recent tweet by Andy Bitterer of Gartner Research (and ANALYSTerical) sparked an interesting online discussion, which was vaguely reminiscent of the classic causality dilemma that is commonly stated as “which came first, the chicken or the egg?”

 

An E-mail from the Edge

On the same day I saw Andy’s tweet, I received an e-mail from a friend and fellow data quality consultant, who had just finished a master data management (MDM) and enterprise data warehouse (EDW) project, which had over 20 customer data sources.

Although he was brought onto the project specifically for data cleansing, he was told from the day of his arrival that because of time constraints, they decided against performing any data cleansing with their recently purchased data quality tool.  Instead, they decided to use their data integration tool to simply perform the massive initial load into their new MDM hub and EDW.

But wait—the story gets even better.  The very first decision this client made was to purchase a consolidated enterprise application development platform with seamlessly integrated components for data quality, data integration, and master data management.

So long before this client had determined their business need, they decided that they needed to build a new MDM hub and EDW, made a huge investment in an entire platform of technology, then decided to use only the basic data integration functionality. 

However, this client was planning to use the real-time data quality and MDM services provided by their very powerful enterprise application development platform to prevent duplicates and any other bad data from entering the system after the initial load. 

But, of course, no one on the project team was actually working on configuring any of those services, or even, for that matter, determining the business rules those services would enforce.  Maybe the salesperson told them it was as easy as flipping a switch?

My friend (especially after looking at the data), preached data quality was a critical business need, but he couldn’t convince them, even despite taking the initiative to present the results of some quick data profiling, standardization, and data matching used to identify duplicate records within and across their primary data sources, which clearly demonstrated the level of poor data quality.

Although this client agreed that they definitely had some serious data issues, they still decided against doing any data cleansing and wanted to just get the data loaded.  Maybe they thought they were loading the data into one of those self-healing databases?

The punchline—this client is a financial services institution with a business need to better identify their most valuable customers.

As my friend lamented at the end of his e-mail, why do clients often later ask why these types of projects fail?

 

Blind Vendor Allegiance

In his recent blog post Blind Vendor Allegiance Trumps Utility, Evan Levy examined this bizarrely common phenomenon of selecting a technology vendor without gathering requirements, reviewing product features, and then determining what tool(s) could best help build solutions for specific business problems—another example of the tool coming before the business need.

Evan was recounting his experiences at a major industry conference on MDM, where people were asking his advice on what MDM vendor to choose, despite admitting “we know we need MDM, but our company hasn’t really decided what MDM is.”

Furthermore, these prospective clients had decided to default their purchasing decision to the technology vendor they already do business with, in other words, “since we’re already a [you can just randomly insert the name of a large technology vendor here] shop, we just thought we’d buy their product—so what do you think of their product?”

“I find this type of question interesting and puzzling,” wrote Evan.  “Why would anyone blindly purchase a product because of the vendor, rather than focusing on needs, priorities, and cost metrics?  Unless a decision has absolutely no risk or cost, I’m not clear how identifying a vendor before identifying the requirements could possibly have a successful outcome.”

 

SaaS-y Data Quality on a Cloudy Business Day?

Emerging industry trends like open source, cloud computing, and software as a service (SaaS) are often touted as less expensive than traditional technology, and I have heard some use this angle to justify buying the tool before identifying the business need.

In his recent blog post Cloud Application versus On Premise, Myths and Realities, Michael Fauscette examined the return on investment (ROI) versus total cost of ownership (TCO) argument quite prevalent in the SaaS versus on premise software debate.

“Buying and implementing software to generate some necessary business value is a business decision, not a technology decision,” Michael concluded.  “The type of technology needed to meet the business requirements comes after defining the business needs.  Each delivery model has advantages and disadvantages financially, technically, and in the context of your business.”

 

So which came first, the Data Quality Tool or the Business Need?

This question is, of course, absurd because, in every rational theory, the business need should always come first.  However, in predictably irrational real-world practice, it remains a classic causality dilemma for data quality related enterprise information initiatives such as data integration, master data management, data warehousing, business intelligence, and data governance.

But sometimes the data quality tool was purchased for an earlier project, and despite what some vendor salespeople may tell you, you don’t always need to buy new technology at the beginning of every new enterprise information initiative. 

Whenever, and before defining your business need, you already have the technology in-house (or you have previously decided, often due to financial constraints, that you will need to build a bespoke solution), you still need to avoid technology bias.

Knowing how the technology works can sometimes cause a framing effect where your business need is defined in terms of the technology’s specific functionality, thereby framing the objective as a technical problem instead of a business problem.

Bottom line—your business problem should always be well-defined before any potential technology solution is evaluated.

 

Related Posts

There are no Magic Beans for Data Quality

Do you believe in Magic (Quadrants)?

Is your data complete and accurate, but useless to your business?

Can Enterprise-Class Solutions Ever Deliver ROI?

Selling the Business Benefits of Data Quality

The Circle of Quality

Selling the Business Benefits of Data Quality

Mr. ZIP In his book Purple Cow: Transform Your Business by Being Remarkable, Seth Godin used many interesting case studies of effective marketing.  One of them was the United States Postal Services.

“Very few organizations have as timid an audience as the United States Postal Service,” explained Godin.  “Dominated by conservative big customers, the Postal Service has a very hard time innovating.  The big direct marketers are successful because they’ve figured out how to thrive under the current system.  Most individuals are in no hurry to change their mailing habits, either.”

“The majority of new policy initiatives at the Postal Service are either ignored or met with nothing but disdain.  But ZIP+4 was a huge success.  Within a few years, the Postal Service diffused a new idea, causing a change in billions of address records in thousands of databases.  How?”

Doesn’t this daunting challenge sound familiar?  An initiative causing a change in billions of records across multiple databases? 

Sounds an awful lot like a massive data cleansing project, doesn’t it?  If you believe selling the business benefits of data quality, especially on such an epic scale, is easy to do, then stop reading right now—and please publish a blog post about how you did it.

 

Going Postal on the Business Benefits

Getting back to Godin’s case study, how did the United States Postal Service (USPS) sell the business benefits of ZIP+4?

“First, it was a game-changing innovation,” explains Godin.  “ZIP+4 makes it far easier for marketers to target neighborhoods, and much faster and easier to deliver the mail.  ZIP+4 offered both dramatically increased speed in delivery and a significantly lower cost for bulk mailers.  These benefits made it worth the time it took mailers to pay attention.  The cost of ignoring the innovation would be felt immediately on the bottom line.”

Selling the business benefits of data quality (or anything else for that matter) requires defining its return on investment (ROI), which always comes from tangible business impacts, such as mitigated risks, reduced costs, or increased revenue.

Reducing costs was a major selling point for ZIP+4.  Additionally, it mitigated some of the risks associated with direct marketing campaigns, such as the ability to target neighborhoods more accurately, as well as reduce delays in postal delivery times.

However, perhaps the most significant selling point was that “the cost of ignoring the innovation would be felt immediately on the bottom line.”  In other words, the USPS articulated very well that the cost of doing nothing was very tangible.

The second reason ZIP+4 was a huge success, according to Godin, was that the USPS “wisely singled out a few early adopters.  These were individuals in organizations that were technically savvy and were extremely sensitive to both pricing and speed issues.  These early adopters were also in a position to sneeze the benefits to other, less astute, mailers.”

Sneezing the benefits is a reference to another Seth Godin book, Unleashing the Ideavirus, where he explains how the most effective business ideas are the ones that spread.  Godin uses the term ideavirus to describe an idea that spreads, and the term sneezers to describe the people who spread it.

In my blog post Sneezing Data Quality, I explained that it isn’t easy being sneezy, but true sneezers are the innovators and disruptive agents within an organization.  They can be the catalysts for crucial changes in corporate culture.

However, just like with literal sneezing, it can get really annoying if it occurs too frequently. 

To sell the business benefits, you need sneezers that will do such an exhilarating job championing the cause of data quality, that they will help cause the very idea of a sustained data quality program to go viral throughout your entire organization, thereby unleashing the Data Quality Ideavirus.

 

Getting Zippy with it

One of the most common objections to data quality initiatives, and especially data cleansing projects, is that they often produce considerable costs without delivering tangible business impacts and significant ROI.

One of the most common ways to attempt selling the business benefits of data quality is the ROI of removing duplicate records, which although sometimes significant (with high duplicate rates) in the sense of reduced costs on the redundant postal deliveries, it doesn’t exactly convince your business stakeholders and financial decision makers of the importance of data quality.

Therefore, it is perhaps somewhat ironic that the USPS story of why ZIP+4 was such a huge success, actually provides such a compelling case study for selling the business benefits of data quality.

However, we should all be inspired by “Zippy” (aka “Mr. Zip” – the USPS Zip Code mascot shown at the beginning of this post), and start “getting zippy with it” (not an official USPS slogan) when it comes to selling the business benefits of data quality:

  1. Define Data Quality ROI using tangible business impacts, such as mitigated risks, reduced costs, or increased revenue
  2. Articulate the cost of doing nothing (i.e., not investing in data quality) by also using tangible business impacts
  3. Select a good early adopter and recruit sneezers to Champion the Data Quality Cause by communicating your successes

What other ideas can you think of for getting zippy with it when it comes to selling the business benefits of data quality?

 

Related Posts

Promoting Poor Data Quality

Sneezing Data Quality

The Only Thing Necessary for Poor Data Quality

Hyperactive Data Quality (Second Edition)

Data Quality: The Reality Show?