Data Governance and Information Quality 2011

Last week, I attended the Data Governance and Information Quality 2011 Conference, which was held June 27-30 in San Diego, California at the Catamaran Resort Hotel and Spa.

In this blog post, I summarize a few of the key points from some of the sessions I attended.  I used Twitter to help me collect my notes, and you can access the complete archive of my conference tweets on Twapper Keeper.

 

Assessing Data Quality Maturity

In his pre-conference tutorial, David Loshin, author of the book The Practitioner’s Guide to Data Quality Improvement, described five stages comprising a continuous cycle of data quality improvement:

  1. Identify and measure how poor data quality impedes business objectives
  2. Define business-related data quality rules and performance targets
  3. Design data quality improvement processes that remediate business process flaws
  4. Implement data quality improvement methods
  5. Monitor data quality against targets

 

Getting Started with Data Governance

Oliver Claude from Informatica provided some tips for making data governance a reality:

  • Data Governance requires acknowledging People, Process, and Technology are interlinked
  • You need to embed your data governance policies into your operational business processes
  • Data Governance must be Business-Centric, Technology-Enabled, and Business/IT Aligned

 

Data Profiling: An Information Quality Fundamental

Danette McGilvray, author of the book Executing Data Quality Projects, shared some of her data quality insights:

  • Although the right technology is essential, data quality is more than just technology
  • Believing tools cause good data quality is like believing X-Ray machines cause good health
  • Data Profiling is like CSI — Investigating the Poor Data Quality Crime Scene

 

Building Data Governance and Instilling Data Quality

In the opening keynote address, Dan Hartley of ConAgra Foods shared his data governance and data quality experiences:

  • It is important to realize that data governance is a journey, not a destination
  • One of the commonly overlooked costs of data governance is the cost of inaction
  • Data governance must follow a business-aligned and business-value-driven approach
  • Data governance is as much about change management as it is anything else
  • Data governance controls must be carefully balanced so they don’t disrupt business processes
  • Common Data Governance Challenge: Balancing Data Quality and Speed (i.e., Business Agility)
  • Common Data Governance Challenge: Picking up Fumbles — Balls dropped between vertical organizational silos
  • Bad business processes cause poor data quality
  • Better Data Quality = A Better Bottom Line
  • One of the most important aspects of Data Governance and Data Quality — Wave the Flag of Success

 

Practical Data Governance

Winston Chen from Kalido discussed some aspects of delivering tangible value with data governance:

  • Data governance is the business process of defining, implementing, and enforcing data policies
  • Every business process can be improved by feeding it better data
  • Data Governance is the Horse, not the Cart, i.e., Data Governance drives MDM and Data Quality
  • Data Governance needs to balance Data Silos (Local Authority) and Data Cathedrals (Central Control)

 

The Future of Data Governance and Data Quality

The closing keynote panel, moderated by Danette McGilvray, included the following insights:

  • David Plotkin: “It is not about Data, Process, or Technology — It is about People”
  • John Talburt: “For every byte of Data, we need 1,000 bytes of Metadata to go along with it”
  • C. Lwanga Yonke: “One of the most essential skills is the ability to lead change”
  • John Talburt: “We need to be focused on business-value-based data governance and data quality”
  • C. Lwanga Yonke: “We must be multilingual: Speak Data/Information, Business, and Technology”

 

Organizing for Data Quality

In his post-conference tutorial, Tom Redman, author of the book Data Driven, described ten habits of those with the best data:

  1. Focus on the most important needs of the most important customers
  2. Apply relentless attention to process
  3. Manage all critical sources of data, including external suppliers
  4. Measure data quality at the source and in business terms
  5. Employ controls at all levels to halt simple errors and establish a basis for moving forward
  6. Develop a knack for continuous improvement
  7. Set and achieve aggressive targets for improvement
  8. Formalize management accountabilities for data
  9. Lead the effort using a broad, senior group
  10. Recognize that the hard data quality issues are soft and actively manage the needed cultural changes

 

Tweeps Out at the Ball Game

As I mentioned earlier, I used Twitter to help me collect my notes, and you can access the complete archive of my conference tweets on Twapper Keeper.

But I wasn’t the only data governance and data quality tweep at the conference.  Steve Sarsfield, April Reeve, and Joe Dos Santos were also attending and tweeting.

However, on Tuesday night, we decided to take a timeout from tweeting, and instead became Tweeps out at the Ball Game by attending the San Diego Padres and Kansas Royals baseball game at PETCO Park.

We sang Take Me Out to the Ball Game, bought some peanuts and Cracker Jack, and root, root, rooted for the home team, which apparently worked since Padres closer Heath Bell got one, two, three strikes, you’re out on Royals third baseman Wilson Betemit, and the San Diego Padres won the game by a final score of 4-2.

So just like at the Data Governance and Information Quality 2011 Conference, a good time was had by all.  See you next year!

 

Related Posts

Stuck in the Middle with Data Governance

DQ-BE: Invitation to Duplication

TDWI World Conference Orlando 2010

Light Bulb Moments at DataFlux IDEAS 2010

Enterprise Data World 2010

Enterprise Data World 2009

TDWI World Conference Chicago 2009

DataFlux IDEAS 2009

Master Data Management in Practice

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Master Data Management in Practice: Achieving True Customer MDM is a great new book by Dalton Cervo and Mark Allen, which demystifies the theories and industry buzz surrounding Master Data Management (MDM), and provides a practical guide for successfully implementing a Customer MDM program.

The book discusses the three major types of MDM (Analytical, Operational, and Enterprise), explaining exactly how MDM is related to, and supported by, data governance, data stewardship, and data quality.  Dalton and Mark explain how MDM does much more than just bring data together—it provides a set of processes, services, and policies that bring people together in a cross-functional and collaborative approach to enterprise data management.

Dalton Cervo has over 20 years experience in software development, project management, and data management, including architectural design and implementation of analytical MDM, and management of a data quality program for an enterprise MDM implementation.  Dalton is a senior solutions consultant at DataFlux, helping organizations in the areas of data governance, data quality, data integration, and MDM.  Read Dalton’s blog, follow Dalton on Twitter, and connect with Dalton on LinkedIn.

Mark Allen has over 20 years of data management and project management experience including extensive planning and deployment experience with customer master data initiatives, data governance programs, and leading data quality management practices.  Mark is a senior consultant and enterprise data governance lead at WellPoint, Inc.  Prior to WellPoint, Mark was a senior program manager in customer operations groups at Sun Microsystems and Oracle, where Mark served as the lead data steward for the customer data domain throughout the planning and implementation of an enterprise customer data hub.

On this episode of OCDQ Radio, I am joined by the authors to discuss how to properly prepare for a new MDM program.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

The Dichotomy Paradox, Data Quality and Zero Defects

As Joseph Mazur explains in Zeno’s Paradox, the ancient Greek philosopher Zeno constructed a series of logical paradoxes to prove that motion is impossible, which today remain on the cutting edge of our investigations into the fabric of space and time.

One of the paradoxes is known as the Dichotomy:

“A moving object will never reach any given point, because however near it may be, it must always first accomplish a halfway stage, and then the halfway stage of what is left and so on, and this series has no end.  Therefore, the object can never reach the end of any given distance.”

Of course, this paradox sounds silly.  After all, reaching a given point like the finish line in a race is reachable in real life since people win races all the time.  However, in theory, the mathematics is maddeningly sound, since it creates an infinite series of steps between the starting point and the finish line—and an infinite number of steps creates a journey that can never end.

Furthermore, this theoretical race cannot even begin, since in order to reach the first step, the recursive nature of this paradox proves that we would never reach the point of completing the first step.  Hence, the paradoxical conclusion is any travel over any finite distance can neither be completed nor begun, and so all motion must be an illusion.  Some of the greatest minds in history (from Galileo to Einstein to Stephen Hawking) have tackled the Dichotomy Paradox—but without being able to disprove it.

Data Quality and Zero Defects

The given point that many enterprise initiatives attempt to reach with data quality is 100% with a metric such as data accuracy.  Leaving aside (in this post) the fact that any data quality metric without a tangible business context provides no business value, 100% data quality (aka Zero Defects) is an unreachable destination—no matter how close you get or how long you try to reach it.

Zero Defects is a laudable goal—but its theory and practice comes from manufacturing quality.  However, I have always been of the opinion, unpopular among some of my peers, that manufacturing quality and data quality are very different disciplines, and although there is much to be learned from studying the theories of manufacturing quality, I believe that brute forcing those theories onto data quality is impractical and fundamentally flawed (and I’ve even said so in verse: To Our Data Perfectionists).

The given point that enterprise initiatives should actually be attempting to reach is data-driven solutions for business problems.

Advocates of Zero Defects argue that, in theory, defect-free data should be fit to serve as the basis for every possible business use, enabling a data-driven solution for any business problem.  However, in practice, business uses for data, as well as business itself, is always evolving.  Therefore, business problems are dynamic problems that do not have—nor do they require—perfect solutions.

Although the Dichotomy Paradox proves motion is theoretically impossible, our physical motion practically proves otherwise.  Has your data quality practice become motionless by trying to prove that Zero Defects is more than just theoretically possible?

The Data Quality Wager

Gordon Hamilton emailed me with an excellent recommended topic for a data quality blog post:

“It always seems crazy to me that few executives base their ‘corporate wagers’ on the statistical research touted by data quality authors such as Tom Redman, Jack Olson and Larry English that shows that 15-45% of the operating expense of virtually all organizations is WASTED due to data quality issues.

So, if every organization is leaving 15-45% on the table each year, why don’t they do something about it?  Philip Crosby says that quality is free, so why do the executives allow the waste to go on and on and on?  It seems that if the shareholders actually think about the Data Quality Wager they might wonder why their executives are wasting their shares’ value.  A large portion of that 15-45% could all go to the bottom line without a capital investment.

I’m maybe sounding a little vitriolic because I’ve been re-reading Deming’s Out of the Crisis and he has a low regard for North American industry because they won’t move beyond their short-term goals to build a quality organization, let alone implement Deming’s 14 principles or Larry English’s paraphrasing of them in a data quality context.”

The Data Quality Wager

Gordon Hamilton explained in his email that his reference to the Data Quality Wager was an allusion to Pascal’s Wager, but what follows is my rendering of it in a data quality context (i.e., if you don’t like what follows, please yell at me, not Gordon).

Although I agree with Gordon, I also acknowledge that convincing your organization to invest in data quality initiatives can be a hard sell.  A common mistake is not framing the investment in data quality initiatives using business language such as mitigated risks, reduced costs, or increased revenue.  I also acknowledge the reality of the fiscal calendar effect and how most initiatives increase short-term costs based on the long-term potential of eventually mitigating risks, reducing costs, or increasing revenue.

Short-term increased costs of a data quality initiative can include the purchase of data quality software and its maintenance fees, as well as the professional services needed for training and consulting for installation, configuration, application development, testing, and production implementation.  And there are often additional short-term increased costs, both external and internal.

Please note that I am talking about the costs of proactively investing in a data quality initiative before any data quality issues have manifested that would prompt reactively investing in a data cleansing project.  Although, either way, the short-term increased costs are the same, I am simply acknowledging the reality that it is always easier for a reactive project to get funding than it is for a proactive program to get funding—and this is obviously not only true for data quality initiatives.

Therefore, the organization has to evaluate the possible outcomes of proactively investing in data quality initiatives while also considering the possible existence of data quality issues (i.e., the existence of tangible business-impacting data quality issues):

WindowsLiveWriter-TheDataQualityWager_BA5E-
  1. Invest in data quality initiatives + Data quality issues exist = Decreased risks and (eventually) decreased costs

  2. Invest in data quality initiatives + Data quality issues do not exist = Only increased costs — No ROI

  3. Do not invest in data quality initiatives + Data quality issues exist = Increased risks and (eventually) increased costs

  4. Do not invest in data quality initiatives + Data quality issues do not exist = No increased costs and no increased risks

Data quality professionals, vendors, and industry analysts all strongly advocate #1 — and all strongly criticize #3.  (Additionally, since we believe data quality issues exist, most “orthodox” data quality folks generally refuse to even acknowledge #2 and #4.)

Unfortunately, when advocating #1, we often don’t effectively sell the business benefits of data quality, and when criticizing #3, we often focus too much on the negative aspects of not investing in data quality.

Only #4 “guarantees” neither increased costs nor increased risks by gambling on not investing in data quality initiatives based on the belief that data quality issues do not exist—and, by default, this is how many organizations make the Data Quality Wager.

How is your organization making the Data Quality Wager?

DQ-Tip: “Undisputable fact about the value and use of data…”

Data Quality (DQ) Tips is an OCDQ regular segment.  Each DQ-Tip is a clear and concise data quality pearl of wisdom.

“Undisputable fact about the value and use of data—any business process that is based on the assumption of having access to trustworthy, accurate, and timely data will produce invalid, unexpected, and meaningless results if this assumption is false.”

This DQ-Tip is from the excellent book Master Data Management and Data Governance by Alex Berson and Larry Dubov.

As data quality professionals, our strategy for quantifying and qualifying the business value of data is an essential tenet of how we make the pitch to get executive management to invest in enterprise data quality improvement initiatives.

However, all too often, the problem when we talk about data with executive management is exactly that—we talk about data.

Let’s instead follow the sage advice of Berson and Dubov.  Before discussing data quality, let’s research the data quality assumptions underlying core business processes.  This due diligence will allow us to frame data quality discussions within a business context by focusing on how the organization is using its data to support its business processes, which will allow us to qualify and quantify the business value of having high quality data as a strategic corporate asset.

 

Related Posts

DQ-Tip: “Data quality tools do not solve data quality problems...”

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

DQ-Tip: “There is no point in monitoring data quality...”

DQ-Tip: “Don't pass bad data on to the next person...”

DQ-Tip: “...Go talk with the people using the data”

DQ-Tip: “Data quality is about more than just improving your data...”

DQ-Tip: “Start where you are...”

Thaler’s Apples and Data Quality Oranges

In the opening chapter of his book Carrots and Sticks, Ian Ayres recounts the story of Thaler’s Apples:

“The behavioral revolution in economics began in 1981 when Richard Thaler published a seven-page letter in a somewhat obscure economics journal, which posed a pretty simple choice about apples.

Which would you prefer:

(A) One apple in one year, or

(B) Two apples in one year plus one day?

This is a strange hypothetical—why would you have to wait a year to receive an apple?  But choosing is not very difficult; most people would choose to wait an extra day to double the size of their gift.

Thaler went on, however, to pose a second apple choice.

Which would you prefer:

(C) One apple today, or

(D) Two apples tomorrow?

What’s interesting is that many people give a different, seemingly inconsistent answer to this second question.  Many of the same people who are patient when asked to consider this choice a year in advance turn around and become impatient when the choice has immediate consequences—they prefer C over D.

What was revolutionary about his apple example is that it illustrated the plausibility of what behavioral economists call ‘time-inconsistent’ preferences.  Richard was centrally interested in the people who chose both B and C.  These people, who preferred two apples in the future but one apple today, flipped their preferences as the delivery date got closer.”

What does this have to do with data quality?  Give me a moment to finish eating my second apple, and then I will explain . . .

 

Data Quality Oranges

Let’s imagine that an orange represents a unit of measurement for data quality, somewhat analogous to data accuracy, such that the more data quality oranges you have, the better the quality of data is for your needs—let’s say for making a business decision.

Which would you prefer:

(A) One data quality orange in one month, or

(B) Two data quality oranges in one month plus one day?

(Please Note: Due to the strange uncertainties of fruit-based mathematics, two data quality oranges do not necessarily equate to a doubling of data accuracy, but two data quality oranges are certainly an improvement over one data quality orange).

Now, of course, on those rare occasions when you can afford to wait a month or so before making a critical business decision, most people would choose to wait an extra day in order to improve their data quality before making their data-driven decision.

However, let’s imagine you are feeling squeezed by a more pressing business decision—now which would you prefer:

(C) One data quality orange today, or

(D) Two data quality oranges tomorrow?

In my experience with data quality and business intelligence, most people prefer B over A—and C over D.

This “time-inconsistent” data quality preference within business intelligence reflects the reality that with the speed at which things change these days, more real-time business decisions are required—perhaps making speed more important than quality.

In a recent Data Knights Tweet Jam, Mark Lorion pondered speed versus quality within business intelligence, asking: “Is it better to be perfect in 30 days or 70% today?  Good enough may often be good enough.”

To which Henrik Liliendahl Sørensen responded with the perfectly pithy wisdom: “Good, Fast, Decision—Pick any two.”

However, Steve Dine cautioned that speed versus quality is decision dependent: “70% is good when deciding how many pencils to order, but maybe not for a one billion dollar acquisition.”

Mark’s follow-up captured the speed versus quality tradeoff succinctly with “Good Now versus Great Later.”  And Henrik added the excellent cautionary note: “Good decision now, great decision too late—especially if data quality is not a mature discipline.”

 

What Say You?

How many data quality oranges do you think it takes?  Or for those who prefer a less fruitful phrasing, where do you stand on the speed versus quality debate?  How good does data quality have to be in order to make a good data-driven business decision?

 

Related Posts

To Our Data Perfectionists

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

Data Quality and the Cupertino Effect

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Data In, Decision Out

The Data-Decision Symphony

Data!

You Can’t Always Get the Data You Want

Data Confabulation in Business Intelligence

Jarrett Goldfedder recently asked the excellent question: When does Data become Too Much Information (TMI)?

We now live in a 24 hours a day, 7 days a week, 365 days a year world-wide whirlwind of constant information flow, where the very air we breath is literally teeming with digital data streams—continually inundating us with new information.

The challenge is our time is a zero-sum game, meaning for every new information source we choose, others are excluded.

There’s no way to acquire all available information.  And even if we somehow could, due to the limitations of human memory, we often don’t remember much of the new information we do acquire.  In my blog post Mind the Gap, I wrote about the need to coordinate our acquisition of new information with its timely and practical application.

So I definitely agree with Jarrett that the need to find the right amount of information appropriate for the moment is the needed (and far from easy) solution.  Since this is indeed the age of the data deluge and TMI, I fear that data-driven decision making may simply become intuition-driven decisions validated after the fact by selectively choosing the data that supports the decision already made.  The human mind is already exceptionally good at doing this—the term for it in psychology is confabulation.

Although, according to Wikipedia, the term can be used to describe neurological or psychological dysfunction, Jonathan Haidt explained in his book The Happiness Hypothesis, confabulation is frequently used by “normal” people as well.  For example, after buying my new smart phone, I chose to read only the positive online reviews about it, trying to make myself feel more confident I had made the right decision—and more capable of justifying my decision beyond saying I bought the phone that looked “cool.”

 

Data Confabulation in Business Intelligence

Data confabulation in business intelligence occurs when intuition-driven business decisions are claimed to be data-driven and justified after the fact using the results of selective post-decision data analysis.  This is even worse than when confirmation bias causes intuition-driven business decisions, which are justified using the results of selective pre-decision data analysis that only confirms preconceptions or favored hypotheses, resulting in potentially bad—albeit data-driven—business decisions.

My fear is that the data deluge will actually increase the use of both of these business decision-making “techniques” because they are much easier than, as Jarrett recommended, trying to make sense of the business world by gathering and sorting through as much data as possible, deriving patterns from the chaos and developing clear-cut, data-driven, data-justifiable business decisions.

But the data deluge generally broadcasts more noise than signal, and sometimes trying to get better data to make better decisions simply means getting more data, which often only delays or confuses the decision-making process, or causes analysis paralysis.

Can we somehow listen for decision-making insights among the cacophony of chaotic and constantly increasing data volumes?

I fear that the information overload of the data deluge is going to trigger an intuition override of data-driven decision making.

 

Related Posts

The Reptilian Anti-Data Brain

Data In, Decision Out

The Data-Decision Symphony

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

DQ-View: From Data to Decision

TDWI World Conference Orlando 2010

Hell is other people’s data

Mind the Gap

The Fragility of Knowledge

#FollowFriday Spotlight: @PhilSimon

FollowFriday Spotlight is an OCDQ regular segment highlighting someone you should follow—and not just Fridays on Twitter.


Phil Simon is an independent technology consultant, author, writer, and dynamic public speaker for hire, who focuses on the intersection of business and technology.  Phil is the author of three books (see below for more details) and also writes for a number of technology media outlets and sites, and hosts the podcast Technology Today.

As an independent consultant, Phil helps his clients optimize their use of technology.  Phil has cultivated over forty clients in a wide variety of industries, including health care, manufacturing, retail, education, telecommunications, and the public sector.

When not fiddling with computers, hosting podcasts, putting himself in comics, and writing, Phil enjoys English Bulldogs, tennis, golf, movies that hurt the brain, fantasy football, and progressive rock.  Phil is a particularly zealous fan of Rush, Porcupine Tree, and Dream Theater.  Anyone who reads his blog posts or books will catch many references to these bands.

 

Books by Phil Simon

My review of The New Small:

By leveraging what Phil Simon calls the Five Enablers (Cloud computing, Software-as-a-Service (SaaS), Free and open source software (FOSS), Mobility, Social technologies), small businesses no longer need to have technology as one of their core competencies, nor invest significant time and money in enabling technology, which allows them to focus on their true core competencies and truly compete against companies of all sizes.

The New Small serves as a practical guide to this brave new world of small business.

 

My review of The Next Wave of Technologies:

The constant challenge faced by organizations, large and small, which are using technology to support the ongoing management of their decision-critical information, is that the business world of information technology can never afford to remain static, but instead, must dynamically evolve and adapt, in order to protect and serve the enterprise’s continuing mission to survive and thrive in today’s highly competitive and rapidly changing marketplace.


The Next Wave of Technologies is required reading if your organization wishes to avoid common mistakes and realize the full potential of new technologies—especially before your competitors do.

 

My review of Why New Systems Fail:

Why New Systems Fail is far from a doom and gloom review of disastrous projects and failed system implementations.  Instead, this book contains numerous examples and compelling case studies, which serve as a very practical guide for how to recognize, and more importantly, overcome the common mistakes that can prevent new systems from being successful.

Phil Simon writes about these complex challenges in a clear and comprehensive style that is easily approachable and applicable to diverse audiences, both academic and professional, as well as readers with either a business or a technical orientation.

 

Blog Posts by Phil Simon

In addition to his great books, Phil is a great blogger.  For example, check out these brilliant blog posts written by Phil Simon:

 

Knights of the Data Roundtable

Phil Simon and I co-host and co-produce the wildly popular podcast Knights of the Data Roundtable, a bi-weekly data management podcast sponsored by the good folks at DataFlux, a SAS Company.

The podcast is a frank and open discussion about data quality, data integration, data governance and all things related to managing data.

 

Related Posts

#FollowFriday Spotlight: @hlsdk

#FollowFriday Spotlight: @DataQualityPro

#FollowFriday and Re-Tweet-Worthiness

#FollowFriday and The Three Tweets

Dilbert, Data Quality, Rabbits, and #FollowFriday

Twitter, Meaningful Conversations, and #FollowFriday

The Fellowship of #FollowFriday

Social Karma (Part 7) – Twitter

DQ-BE: Dear Valued Customer

Data Quality By Example (DQ-BE) is an OCDQ regular segment that provides examples of data quality key concepts.

The term “valued customer” is bandied about quite frequently and is often at the heart of enterprise data management initiatives such as Customer Data Integration (CDI), 360° Customer View, and Customer Master Data Management (MDM).

The role of data quality in these initiatives is an important, but sometimes mistakenly overlooked, consideration.

For example, the Service Contract Renewal Notice (shown above) I recently received exemplifies the impact of poor data quality on Customer Relationship Management (CRM) since one of my service providers wants me—as a valued customer—to purchase a new service contract for one of my laptop computers.

Let’s give them props for generating a 100% accurate residential postal address, since how could I even consider renewing my service contract if I don’t receive the renewal notice in the mail?  Let’s also acknowledge my Customer ID is also 100% accurate, since that is the “unique identifier” under which I have purchased all of my products and services from this company.

However, the biggest data quality mistake is that the name of their “Valued Customer” is not INDEPENDENT CONSULTANT.  (And they get bonus negative points for writing it in ALL CAPS).

The moral of the story is that if you truly value your customers, then you should truly value your customer data quality.

At the very least—get your customer’s name right.

 

Related Posts

Customer Incognita

Identifying Duplicate Customers

Adventures in Data Profiling (Part 7) – Customer Name

The Quest for the Golden Copy (Part 3) – Defining “Customer”

‘Tis the Season for Data Quality

The Seven Year Glitch

DQ-IRL (Data Quality in Real Life)

Data Quality, 50023

Once Upon a Time in the Data

The Semantic Future of MDM

The Asymptote of Data Quality

In analytic geometry (according to Wikipedia), an asymptote of a curve is a line such that the distance between the curve and the line approaches zero as they tend to infinity.  The inspiration for my hand-drawn illustration was a similar one (not related to data quality) in the excellent book Linchpin: Are You Indispensable? by Seth Godin, which describes an asymptote as:

“A line that gets closer and closer and closer to perfection, but never quite touches.”

“As you get closer to perfection,” Godin explains, “it gets more and more difficult to improve, and the market values the improvements a little bit less.  Increasing your free-throw percentage from 98 to 99 percent may rank you better in the record books, but it won’t win any more games, and the last 1 percent takes almost as long to achieve as the first 98 percent did.”

The pursuit of data perfection is a common debate in data quality circles, where it is usually known by the motto:

“The data will always be entered right, the first time, every time.”

However, Henrik Liliendahl Sørensen has cautioned that even when this ideal can be achieved, we must still acknowledge the inconvenient truth that things change, and Evan Levy has reminded us that data quality isn’t the same as data perfection, and David Loshin has used the Pareto principle to describe the point of diminishing returns in data quality improvements.

Chasing data perfection can be a powerful motivation, but it can also undermine the best of intentions.  Not only is it important to accept that the Asymptote of Data Quality can never be reached, but we must realize that data perfection was never the goal.

The goal is data-driven solutions for business problems—and these dynamic problems rarely have (or require) a perfect solution.

Data quality practitioners must strive for continuous data quality improvement, but always within the business context of data, and without losing themselves in the pursuit of a data-myopic ideal such as data perfection.

 

Related Posts

To Our Data Perfectionists

The Data-Decision Symphony

Is your data complete and accurate, but useless to your business?

Finding Data Quality

MacGyver: Data Governance and Duct Tape

You Can’t Always Get the Data You Want

What going to the dentist taught me about data quality

A Tale of Two Q’s

Data Quality and The Middle Way

Hyperactive Data Quality (Second Edition)

Missed It By That Much

The Data Quality Goldilocks Zone

What Does Data Quality Technology Want?

During a recent Radiolab podcast, Kevin Kelly, author of the book What Technology Wants, used the analogy of how a flower leans toward sunlight because it “wants” the sunlight, to describe what the interweaving web of evolving technical innovations (what he refers to as the super-organism of technology) is leaning toward—in other words, what technology wants.

The other Radiolab guest was Steven Johnson, author of the book Where Good Ideas Come From, who somewhat dispelled the traditional notion of the eureka effect by explaining that the evolution of ideas, like all evolution, stumbles its way toward the next good idea, which inevitably leads to a significant breakthrough, such as what happens with innovations in technology.

Listening to this thought-provoking podcast made me ponder the question: What does data quality technology want?

In a previous post, I used the term OOBE-DQ to refer to the out-of-box-experience (OOBE) provided by data quality (DQ) tools, which usually becomes a debate between “ease of use” and “powerful functionality” after you ignore the Magic Beans sales pitch that guarantees you the data quality tool is both remarkably easy to use and incredibly powerful.

The data quality market continues to evolve away from esoteric technical tools and stumble its way toward the next good idea, which is business-empowering suites providing robust functionality with increasingly role-based user interfaces, which are tailored to the specific needs of different users.  Of course, many vendors would love to claim sole responsibility for what they would call significant innovations in data quality technology, instead of what are simply by-products of an evolving market.

The deployment of data quality functionality within and across organizations also continues to evolve, as data cleansing activities are being complemented by real-time defect prevention services used to greatly minimize poor data quality at the multiple points of origin within the enterprise data ecosystem.

However, viewpoints about the role of data quality technology generally remain split between two opposing perspectives:

  1. Technology enables a data quality process, but doesn’t obviate the need for people (e.g., data stewards) to remain actively involved and be held accountable for maintaining the quality of data.
  2. Technology automates a data quality process, and a well-designed and properly implemented technical solution obviates the need for people to be actively involved after its implementation.

Do you think that continuing advancements and innovations in data quality technology will obviate the need for people to be actively involved in data quality processes?  In the future, will we have high quality data because our technology essentially wants it and therefore leans our organizations toward high quality data?  Let’s conduct another unscientific data quality poll:

 

Additionally, please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

 

Related Posts

DQ-Tip: “Data quality tools do not solve data quality problems...”

Which came first, the Data Quality Tool or the Business Need?

Data Quality Industry: Problem Solvers or Enablers?

Data Quality Magic

The Tooth Fairy of Data Quality

Data Quality is not a Magic Trick

Do you believe in Magic (Quadrants)?

Pirates of the Computer: The Curse of the Poor Data Quality

A Confederacy of Data Defects

One of my favorite novels is A Confederacy of Dunces by John Kennedy Toole.  The novel tells the tragicomic tale of Ignatius J. Reilly, described in the foreword by Walker Percy as a “slob extraordinary, a mad Oliver Hardy, a fat Don Quixote, and a perverse Thomas Aquinas rolled into one.”

The novel was written in the 1960s before the age of computer filing systems, so one of the jobs Ignatius has is working as a paper filing clerk in a clothing factory.  His employer is initially impressed with his job performance, since the disorderly mess of invoices and other paperwork slowly begin to disappear, resulting in the orderly appearance of a well organized and efficiently managed office space.

However, Ignatius is fired after he reveals the secret to his filing system—instead of filing the paperwork away into the appropriate file cabinets, he has simply been throwing all of the paperwork into the trash.

This scene reminds me of how data quality issues (aka data defects) are often perceived.  Many organizations acknowledge the importance of data quality, but don’t believe that data defects occur very often because the data made available to end users in dashboards and reports often passes through many processes that cleanse or otherwise sanitize the data before it reaches them.

ETL processes that extract source data for a data warehouse load will often perform basic data quality checks.  However, a fairly standard practice for “resolving” a data defect is to substitute a NULL value (e.g., a date stored in a text field in a source system that can not be converted into a valid date value is usually loaded into the target relational database with a NULL value).

When postal address validation software generates a valid mailing address, it often does so by removing what it considers to be “extraneous” information from the input address fields, which may include valid data accidentally entered into the wrong field, or that was lacking its own input field (e.g., e-mail address in an input address field deleted from the output valid mailing address).

And some reporting processes intentionally filter out “bad records” or eliminate “outlier values.”  This happens most frequently when preparing highly summarized reports, especially those intended for executive management.

These are just a few examples of common practices that can create the orderly appearance of a high quality data environment, but that conceal a confederacy of data defects about which the organization may remain blissfully (and dangerously) ignorant.

Do you suspect that your organization may be concealing A Confederacy of Data Defects?

TDWI World Conference Orlando 2010

Last week I attended the TDWI World Conference held November 7-12 in Orlando, Florida at the Loews Royal Pacific Resort.

As always, TDWI conferences offer a variety of full-day and half-day courses taught in an objective, vendor-neutral manner, designed for professionals and taught by in-the-trenches practitioners who are well known in the industry.

In this blog post, I summarize a few key points from two of the courses I attended.  I used Twitter to help me collect my notes, and you can access the complete archive of my conference tweets on Twapper Keeper.

 

A Practical Guide to Analytics

Wayne Eckerson, author of the book Performance Dashboards: Measuring, Monitoring, and Managing Your Business, described the four waves of business intelligence:

  1. Reporting – What happened?
  2. Analysis – Why did it happen?
  3. Monitoring – What’s happening?
  4. Prediction – What will happen?

“Reporting is the jumping off point for analytics,” explained Eckerson, “but many executives don’t realize this.  The most powerful aspect of analytics is testing our assumptions.”  He went on to differentiate the two strains of analytics:

  1. Exploration and Analysis – Top-down and deductive, primarily uses query tools
  2. Prediction and Optimization – Bottom-up and inductive, primarily uses data mining tools

“A huge issue for predictive analytics is getting people to trust the predictions,” remarked Eckerson.  “Technology is the easy part, the hard part is selling the business benefits and overcoming cultural resistance within the organization.”

“The key is not getting the right answers, but asking the right questions,” he explained, quoting Ken Rudin of Zynga.

“Deriving insight from its unique information will always be a competitive advantage for every organization.”  He recommended the book Competing on Analytics: The New Science of Winning as a great resource for selling the business benefits of analytics.

 

Data Governance for BI Professionals

Jill Dyché, a partner and co-founder of Baseline Consulting, explained that data governance transcends business intelligence and other enterprise information initiatives such as data warehousing, master data management, and data quality.

“Data governance is the organizing framework,” explained Dyché, “for establishing strategy, objectives, and policies for corporate data.  Data governance is the business-driven policy making and oversight of corporate information.”

“Data governance is necessary,” remarked Dyché, “whenever multiple business units are sharing common, reusable data.”

“Data governance aligns data quality with business measures and acceptance, positions enterprise data issues as cross-functional, and ensures data is managed separately from its applications, thereby evolving data as a service (DaaS).”

In her excellent 2007 article Serving the Greater Good: Why Data Hoarding Impedes Corporate Growth, Dyché explained the need for “systemizing the notion that data – corporate asset that it is – belongs to everyone.”

“Data governance provides the decision rights around the corporate data asset.”

 

Related Posts

DQ-View: From Data to Decision

Podcast: Data Governance is Mission Possible

The Business versus IT—Tear down this wall!

MacGyver: Data Governance and Duct Tape

Live-Tweeting: Data Governance

Enterprise Data World 2010

Enterprise Data World 2009

TDWI World Conference Chicago 2009

Light Bulb Moments at DataFlux IDEAS 2010

DataFlux IDEAS 2009

Quality and Governance are Beyond the Data

Last week’s episode of DM Radio on Information Management, co-hosted as always by Eric Kavanagh and Jim Ericson, was a panel discussion about how and why data governance can improve the quality of an organization’s data, and the featured guests were Dan Soceanu of DataFlux, Jim Orr of Trillium Software, Steve Sarsfield of Talend, and Brian Parish of iData.

The relationship between data quality and data governance is a common question, and perhaps mostly because data governance is still an evolving discipline.  However, another contributing factor is the prevalence of the word “data” in the names given to most industry disciplines and enterprise information initiatives.

“Data governance goes well beyond just the data,” explained Orr.  “Administration, business process, and technology are also important aspects, and therefore the term data governance can be misleading.”

“So perhaps a best practice of data governance is not calling it data governance,” remarked Ericson.

From my perspective, data governance involves policies, people, business processes, data, and technology.  However, all of those last four concepts (people, business process, data, and technology) are critical to every enterprise initiative.

So I agree with Orr because I think that the key concept differentiating data governance is its definition and enforcement of the policies that govern the complex ways that people, business processes, data, and technology interact.

As it relates to data quality, I believe that data governance provides the framework for evolving data quality from a project to an enterprise-wide program by facilitating the collaboration of business and technical stakeholders.  Data governance aligns data usage with business processes through business relevant metrics, and enables people to be responsible for, among other things, data ownership and data quality.

“A basic form of data governance is tying the data quality metrics to their associated business processes and business impacts,” explained Sarsfield, the author of the great book The Data Governance Imperative, which explains that “the mantra of data governance is that technologists and business users must work together to define what good data is by constantly leveraging both business users, who know the value of the data, and technologists, who can apply what the business users know to the data.”

Data is used as the basis to make critical business decisions, and therefore “the key for data quality metrics is the confidence level that the organization has in the data,” explained Soceanu.  Data-driven decisions are better than intuition-driven decisions, but lacking confidence about the quality of their data can lead organizations to rely more on intuition for their business decisions.

The Data Asset: How Smart Companies Govern Their Data for Business Success, written by Tony Fisher, the CEO of DataFlux, is another great book about data governance, which explains that “data quality is about more than just improving your data.  Ultimately, the goal is improving your organization.  Better data leads to better decisions, which leads to better business.  Therefore, the very success of your organization is highly dependent on the quality of your data.”

Data is a strategic corporate asset and, by extension, data quality and data governance are both strategic corporate disciplines, because high quality data serves as a solid foundation for an organization’s success, empowering people, enabled by technology, to make better business decisions and optimize business performance.

Therefore, data quality and data governance both go well beyond just improving the quality of an organization’s data, because Quality and Governance are Beyond the Data.

 

Related Posts

Video: Declaration of Data Governance

Don’t Do Less Bad; Do Better Good

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Finding Data Quality

The Diffusion of Data Governance

MacGyver: Data Governance and Duct Tape

The Prince of Data Governance

Jack Bauer and Enforcing Data Governance Policies

Data Governance and Data Quality

Trust is not a checklist

This is my seventh blog post tagged Karma since I promised to discuss it directly and indirectly on my blog throughout the year after declaring KARMA my theme word for 2010 back on the first day of January, which is now almost ten months ago.

 

Trust and Collaboration

I was reminded of the topic of this post—trust—by this tweet by Jill Wanless sent from the recent Collaborative Culture Camp, which was a one day conference on enabling collaboration in a government context, held on October 15 in Ottawa, Ontario.

I followed the conference Twitter stream remotely and found many of the tweets interesting, especially ones about the role that trust plays in collaboration, which is one of my favorite topics in general, and one that plays well with my karma theme word.

 

Trust is not a checklist

The title of this blog post comes from the chapter on The Emergence of Trust in the book Start with Why by Simon Sinek, where he explained that trust is an organizational performance category that is nearly impossible to measure.

“Trust does not emerge simply because a seller makes a rational case why the customer should buy a product or service, or because an executive promises change.  Trust is not a checklist.  Fulfilling all your responsibilities does not create trust.  Trust is a feeling, not a rational experience.  We trust some people and companies even when things go wrong, and we don’t trust others even though everything might have gone exactly as it should have.  A completed checklist does not guarantee trust.  Trust begins to emerge when we have a sense that another person or organization is driven by things other than their own self-gain.”

 

Trust is not transparency

This past August, Scott Berkun blogged about how “trust is always more important than authenticity and transparency.”

“The more I trust you,” Berkun explained, “the less I need to know the details of your plans or operations.  Honesty, diligence, fairness, and clarity are the hallmarks of good relationships of all kinds and lead to the magic of trust.  And it’s trust that’s hardest to earn and easiest to destroy, making it the most precious attribute of all.  Becoming more transparent is something you can do by yourself, but trust is something only someone else can give to you.  If transparency leads to trust, that’s great, but if it doesn’t you have bigger problems to solve.”

 

Organizational Karma

Trust and collaboration create strong cultural ties, both personally and professionally.

“A company is a culture,” Sinek explained.  “A group of people brought together around a common set of values and beliefs.  It’s not the products or services that bind a company together.  It’s not size and might that make a company strong, it’s the culture, the strong sense of beliefs and values that everyone, from the CEO to the receptionist, all share.”

Organizations looking for ways to survive and thrive in today’s highly competitive and rapidly evolving marketplace, should embrace the fact that trust and collaboration are the organizational karma of corporate culture.

Trust me on this one—good karma is good business.

 

Related Posts

New Time Human Business

The Great Rift

Social Karma (Part 6)

The Challenging Gift of Social Media

The Importance of Envelopes

True Service

The Game of Darts – An Allegory

“I can make glass tubes”

My #ThemeWord for 2010: KARMA

The Business versus IT—Tear down this wall!

The Road of Collaboration

Video: Declaration of Data Governance