The Dichotomy Paradox, Data Quality and Zero Defects

As Joseph Mazur explains in Zeno’s Paradox, the ancient Greek philosopher Zeno constructed a series of logical paradoxes to prove that motion is impossible, which today remain on the cutting edge of our investigations into the fabric of space and time.

One of the paradoxes is known as the Dichotomy:

“A moving object will never reach any given point, because however near it may be, it must always first accomplish a halfway stage, and then the halfway stage of what is left and so on, and this series has no end.  Therefore, the object can never reach the end of any given distance.”

Of course, this paradox sounds silly.  After all, reaching a given point like the finish line in a race is reachable in real life since people win races all the time.  However, in theory, the mathematics is maddeningly sound, since it creates an infinite series of steps between the starting point and the finish line—and an infinite number of steps creates a journey that can never end.

Furthermore, this theoretical race cannot even begin, since in order to reach the first step, the recursive nature of this paradox proves that we would never reach the point of completing the first step.  Hence, the paradoxical conclusion is any travel over any finite distance can neither be completed nor begun, and so all motion must be an illusion.  Some of the greatest minds in history (from Galileo to Einstein to Stephen Hawking) have tackled the Dichotomy Paradox—but without being able to disprove it.

Data Quality and Zero Defects

The given point that many enterprise initiatives attempt to reach with data quality is 100% with a metric such as data accuracy.  Leaving aside (in this post) the fact that any data quality metric without a tangible business context provides no business value, 100% data quality (aka Zero Defects) is an unreachable destination—no matter how close you get or how long you try to reach it.

Zero Defects is a laudable goal—but its theory and practice comes from manufacturing quality.  However, I have always been of the opinion, unpopular among some of my peers, that manufacturing quality and data quality are very different disciplines, and although there is much to be learned from studying the theories of manufacturing quality, I believe that brute forcing those theories onto data quality is impractical and fundamentally flawed (and I’ve even said so in verse: To Our Data Perfectionists).

The given point that enterprise initiatives should actually be attempting to reach is data-driven solutions for business problems.

Advocates of Zero Defects argue that, in theory, defect-free data should be fit to serve as the basis for every possible business use, enabling a data-driven solution for any business problem.  However, in practice, business uses for data, as well as business itself, is always evolving.  Therefore, business problems are dynamic problems that do not have—nor do they require—perfect solutions.

Although the Dichotomy Paradox proves motion is theoretically impossible, our physical motion practically proves otherwise.  Has your data quality practice become motionless by trying to prove that Zero Defects is more than just theoretically possible?

DQ-View: Talking about Data

Data Quality (DQ) View is an OCDQ regular segment.  Each DQ-View is a brief video discussion of a data quality key concept.

 

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: DQ-View on Vimeo

 

Related Posts

DQ-View: The Poor Data Quality Blizzard

DQ-View: New Data Resolutions

DQ-View: From Data to Decision

DQ View: Achieving Data Quality Happiness

Data Quality is not a Magic Trick

DQ-View: The Cassandra Effect

DQ-View: Is Data Quality the Sun?

DQ-View: Designated Asker of Stupid Questions

Video: Oh, the Data You’ll Show!

The Data Governance Oratorio

Boston Symphony Orchestra

An oratorio is a large musical composition collectively performed by an orchestra of musicians and choir of singers, all of whom accept a shared responsibility for the quality of their performance, but also requires individual performers accept accountability for playing their own musical instrument or singing their own lines, which includes an occasional instrumental or lyrical solo.

During a well-executed oratorio, individual mastery combines with group collaboration, creating a true symphony, a sounding together, which produces a more powerful performance than even the most consummate solo artist could deliver on their own.

 

The Data Governance Oratorio

Ownership, Responsibility, and Accountability comprise the core movements of the Data Governance ORA-torio.

Data is a corporate asset collectively owned by the entire enterprise.  Data governance is a cross-functional, enterprise-wide initiative requiring that everyone, regardless of their primary role or job function, accept a shared responsibility for preventing data quality issues, and for responding appropriately to mitigate the associated business risks when issues do occur.  However, individuals must still be held accountable for the specific data, business process, and technology aspects of data governance.

Data governance provides the framework for the communication and collaboration of business, data, and technical stakeholders, and establishes an enterprise-wide understanding of the roles and responsibilities involved, and the accountability required to support the organization’s business activities, and materialize the value of the enterprise’s data as positive business impacts.

Collective ownership, shared responsibility, and individual accountability combine to create a true enterprise-wide symphony, a sounding together by the organization’s people, who, when empowered by high quality data and enabled by technology, can optimize business processes for superior corporate performance.

Is your organization collectively performing the Data Governance Oratorio?

 

Related Posts

Data Governance and the Buttered Cat Paradox

Beware the Data Governance Ides of March

Zig-Zag-Diagonal Data Governance

A Tale of Two G’s

The People Platform

The Collaborative Culture of Data Governance

Connect Four and Data Governance

The Business versus IT—Tear down this wall!

The Road of Collaboration

Collaboration isn’t Brain Surgery

Shared Responsibility

The Role Of Data Quality Monitoring In Data Governance

Quality and Governance are Beyond the Data

Data Transcendentalism

Podcast: Data Governance is Mission Possible

Video: Declaration of Data Governance

Don’t Do Less Bad; Do Better Good

Jack Bauer and Enforcing Data Governance Policies

The Prince of Data Governance

MacGyver: Data Governance and Duct Tape

The Diffusion of Data Governance

How active is your data quality practice?

My recent blog post The Data Quality Wager received a provocative comment from Richard Ordowich that sparked another round of discussion and debate about proactive data quality versus reactive data quality in the LinkedIn Group for the IAIDQ.

“Data quality is a reactive practice,” explained Ordowich.  “Perhaps that is not what is professed in the musings of others or the desired outcome, but it is nevertheless the current state of the best practices.  Data profiling and data cleansing are after the fact data quality practices.  The data is already defective.  Proactive defect prevention requires a greater discipline and changes to organizational behavior that is not part of the current best practices.  This I suggest is wishful thinking at this point in time.”

“How can data quality practices,” C. Lwanga Yonke responded, “that do not include proactive defect prevention (with the required discipline and changes to organizational behavior) be considered best practices?  Seems to me a data quality program must include these proactive activities to be considered a best practice.  And from what I see, there are many such programs out there.  True, they are not the majority—but they do exist.”

After Ordowich requested real examples of proactive data quality practices, Jayson Alayay commented “I have implemented data quality using statistical process control techniques where expected volumes and ratios are predicted using forecasting models that self-adjust using historical trends.  We receive an alert when significant deviations from forecast are detected.  One of our overarching data quality goals is to detect a significant data issue as soon as it becomes detectable in the system.”

“It is possible,” replied Ordowich, “to estimate the probability of data errors in data sets based on the currency (freshness) and usage of the data.  The problem is this process does not identify the specific instances of errors just the probability that an error may exist in the data set.  These techniques only identify trends not specific instances of errors.  These techniques do not predict the probability of a single instance data error that can wreak havoc.  For example, the ratings of mortgages was a systemic problem, which data quality did not address.  Yet the consequences were far and wide.  Also these techniques do not predict systemic quality problems related to business policies and processes.  As a result, their direct impact on the business is limited.”

“For as long as human hands key in data,” responded Alayay, “a data quality implementation to a great extent will be reactive.  Improving data quality not only pertains to detection of defects, but also enhancement of content, e.g., address standardization, geocoding, application of rules and assumptions to replace missing values, etc.  With so many factors in play, a real life example of a proactive data quality implementation that suits what you’re asking for may be hard to pinpoint.  My opinion is that the implementation of ‘comprehensive’ data quality programs can have big rewards and big risks.  One big risk is that it can slow time-to-market and kill innovation because otherwise talented people would be spending a significant amount of their time complying with rules and standards in the name of improving data quality.”

“When an organization embarks on a new project,” replied Ordowich, “at what point in the conversation is data quality discussed?  How many marketing plans, new product development plans, or even software development plans have you seen include data quality?  Data quality is not even an afterthought in most organizations, it is ignored.  Data quality is not even in the vocabulary until a problem occurs.  Data quality is not part of the culture or behaviors within most organizations.”

 

 

Please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

 

Related Posts

A Tale of Two Q’s

What going to the dentist taught me about data quality

Groundhog Data Quality Day

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

What Data Quality Technology Wants

MacGyver: Data Governance and Duct Tape

To Our Data Perfectionists

Finding Data Quality

Retroactive Data Quality

The Data Quality Wager

Gordon Hamilton emailed me with an excellent recommended topic for a data quality blog post:

“It always seems crazy to me that few executives base their ‘corporate wagers’ on the statistical research touted by data quality authors such as Tom Redman, Jack Olson and Larry English that shows that 15-45% of the operating expense of virtually all organizations is WASTED due to data quality issues.

So, if every organization is leaving 15-45% on the table each year, why don’t they do something about it?  Philip Crosby says that quality is free, so why do the executives allow the waste to go on and on and on?  It seems that if the shareholders actually think about the Data Quality Wager they might wonder why their executives are wasting their shares’ value.  A large portion of that 15-45% could all go to the bottom line without a capital investment.

I’m maybe sounding a little vitriolic because I’ve been re-reading Deming’s Out of the Crisis and he has a low regard for North American industry because they won’t move beyond their short-term goals to build a quality organization, let alone implement Deming’s 14 principles or Larry English’s paraphrasing of them in a data quality context.”

The Data Quality Wager

Gordon Hamilton explained in his email that his reference to the Data Quality Wager was an allusion to Pascal’s Wager, but what follows is my rendering of it in a data quality context (i.e., if you don’t like what follows, please yell at me, not Gordon).

Although I agree with Gordon, I also acknowledge that convincing your organization to invest in data quality initiatives can be a hard sell.  A common mistake is not framing the investment in data quality initiatives using business language such as mitigated risks, reduced costs, or increased revenue.  I also acknowledge the reality of the fiscal calendar effect and how most initiatives increase short-term costs based on the long-term potential of eventually mitigating risks, reducing costs, or increasing revenue.

Short-term increased costs of a data quality initiative can include the purchase of data quality software and its maintenance fees, as well as the professional services needed for training and consulting for installation, configuration, application development, testing, and production implementation.  And there are often additional short-term increased costs, both external and internal.

Please note that I am talking about the costs of proactively investing in a data quality initiative before any data quality issues have manifested that would prompt reactively investing in a data cleansing project.  Although, either way, the short-term increased costs are the same, I am simply acknowledging the reality that it is always easier for a reactive project to get funding than it is for a proactive program to get funding—and this is obviously not only true for data quality initiatives.

Therefore, the organization has to evaluate the possible outcomes of proactively investing in data quality initiatives while also considering the possible existence of data quality issues (i.e., the existence of tangible business-impacting data quality issues):

WindowsLiveWriter-TheDataQualityWager_BA5E-
  1. Invest in data quality initiatives + Data quality issues exist = Decreased risks and (eventually) decreased costs

  2. Invest in data quality initiatives + Data quality issues do not exist = Only increased costs — No ROI

  3. Do not invest in data quality initiatives + Data quality issues exist = Increased risks and (eventually) increased costs

  4. Do not invest in data quality initiatives + Data quality issues do not exist = No increased costs and no increased risks

Data quality professionals, vendors, and industry analysts all strongly advocate #1 — and all strongly criticize #3.  (Additionally, since we believe data quality issues exist, most “orthodox” data quality folks generally refuse to even acknowledge #2 and #4.)

Unfortunately, when advocating #1, we often don’t effectively sell the business benefits of data quality, and when criticizing #3, we often focus too much on the negative aspects of not investing in data quality.

Only #4 “guarantees” neither increased costs nor increased risks by gambling on not investing in data quality initiatives based on the belief that data quality issues do not exist—and, by default, this is how many organizations make the Data Quality Wager.

How is your organization making the Data Quality Wager?

Zig-Zag-Diagonal Data Governance

This is a screen capture of the results of last month’s unscientific poll about the best way to approach data governance, which requires executive sponsorship and a data governance board for the top-down-driven activities of funding, policy making and enforcement, decision rights, and arbitration of conflicting business priorities as well as organizational politics—but also requires data stewards and other grass roots advocates for the bottom-up-driven activities of policy implementation, data remediation, and process optimization, all led by the example of peer-level change agents adopting the organization’s new best practices for data quality management, business process management, and technology management.

Hybrid Approach (starting Top-Down) won by a slim margin, but overall the need for a hybrid approach to data governance was the prevailing consensus opinion, with the only real debate being whether to start data governance top-down or bottom-up.

 

Commendable Comments

Rob Drysdale commented: “Too many companies get paralyzed thinking about how to do this and implement it. (Along with the overwhelmed feeling that it is too much time/effort/money to fix it.)  But I think your poll needs another option to vote on, specifically: ‘Whatever works for the company/culture/organization’ since not all solutions will work for every organization.  In some where it is highly structured, rigid and controlled, there wouldn’t be the freedom at the grass-roots level to start something like this and it might be frowned upon by upper-level management.  In other organizations that foster grass-roots things then it could work.  However, no matter which way you can get it started and working, you need to have buy-in and commitment at all levels to keep it going and make it effective.”

Paul Fulton commented: “I definitely agree that it needs to be a combination of both.  Data Governance at a senior level making key decisions to provide air cover and Data Management at the grass-roots level actually making things happen.”

Jill Wanless commented: “Our organization has taken the Hybrid Approach (starting Bottom-Up) and it works well for two reasons: (1) the worker bee rock stars are all aligned and ready to hit the ground running, and (2) the ‘Top’ can sit back and let the ‘aligned’ worker bees get on with it.  Of course, this approach is sometimes (painfully) slow, but with the ground-level rock stars already aligned, there is less resistance implementing the policies, and the Top’s heavy hand is needed much less frequently, but I voted for Hybrid Approach (starting Top-Down) because I have less than stellar patience for the long and scenic route.”

 

Zig-Zag-Diagonal Data Governance

I definitely agree with Rob’s well-articulated points that corporate culture is the most significant variable with data governance since it determines whether starting top-down or bottom-up is the best approach for a particular organization—and no matter which way you get started, you eventually need buy-in and commitment at all levels to keep it going and make it effective.

I voted for Hybrid Approach (starting Bottom-Up) since I have seen more data governance programs get successfully started because of the key factor of grass-roots alignment minimizing resistance to policy implementation, as Jill’s comment described.

And, of course, I agree with Paul’s remark that eventually data governance will require a combination of both top-down and bottom-up aspects.  At certain times during the evolution of a data governance program, top-down aspects will be emphasized, and at other times, bottom-up aspects will be emphasized.  However, it is unlikely that any long-term success can be sustained by relying exclusively on either a top-down-only or a bottom-up-only approach to data governance.

Let’s stop debating top-down versus bottom-up data governance—and start embracing Zig-Zag-Diagonal Data Governance.

 

Data Governance “Next Practices”

Phil Simon and I co-host and co-produce the wildly popular podcast Knights of the Data Roundtable, a bi-weekly data management podcast sponsored by the good folks at DataFlux, a SAS Company.

On Episode 5, our special guest, best-practice expert, and all-around industry thought leader Jill Dyché discussed her excellent framework for data governance “next practices” called The 5 + 2 Model.

 

Related Posts

Beware the Data Governance Ides of March

Data Governance and the Buttered Cat Paradox

Twitter, Data Governance, and a #ButteredCat #FollowFriday

A Tale of Two G’s

The Collaborative Culture of Data Governance

Connect Four and Data Governance

Quality and Governance are Beyond the Data

Podcast: Data Governance is Mission Possible

Video: Declaration of Data Governance

Don’t Do Less Bad; Do Better Good

Jack Bauer and Enforcing Data Governance Policies

The Prince of Data Governance

MacGyver: Data Governance and Duct Tape

DQ-Tip: “Undisputable fact about the value and use of data…”

Data Quality (DQ) Tips is an OCDQ regular segment.  Each DQ-Tip is a clear and concise data quality pearl of wisdom.

“Undisputable fact about the value and use of data—any business process that is based on the assumption of having access to trustworthy, accurate, and timely data will produce invalid, unexpected, and meaningless results if this assumption is false.”

This DQ-Tip is from the excellent book Master Data Management and Data Governance by Alex Berson and Larry Dubov.

As data quality professionals, our strategy for quantifying and qualifying the business value of data is an essential tenet of how we make the pitch to get executive management to invest in enterprise data quality improvement initiatives.

However, all too often, the problem when we talk about data with executive management is exactly that—we talk about data.

Let’s instead follow the sage advice of Berson and Dubov.  Before discussing data quality, let’s research the data quality assumptions underlying core business processes.  This due diligence will allow us to frame data quality discussions within a business context by focusing on how the organization is using its data to support its business processes, which will allow us to qualify and quantify the business value of having high quality data as a strategic corporate asset.

 

Related Posts

DQ-Tip: “Data quality tools do not solve data quality problems...”

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

DQ-Tip: “There is no point in monitoring data quality...”

DQ-Tip: “Don't pass bad data on to the next person...”

DQ-Tip: “...Go talk with the people using the data”

DQ-Tip: “Data quality is about more than just improving your data...”

DQ-Tip: “Start where you are...”

Follow the Data

In his recent blog post Multiple Data Touch Points, David Loshin wrote about how common it is for many organizations to not document how processes acquire, read, or modify data.  As a result, when an error occurs, it manifests itself in a downstream application, and it takes a long time to figure out where the error occurred and how it was related to the negative impacts.

Data is often seen as just a by-product of business and technical processes, but a common root cause of poor data quality is this lack of awareness of the end-to-end process of how the organization is using its data to support its business activities.

For example, imagine we have discovered an error in a report.  Do we know the business and technical processes the data passed through before appearing in the report?  Do we know the chain of custody for the report data?  In other words, do we know the business analyst who prepared it, the data steward who verified its data quality, the technical architect who designed its database, and the data entry clerk who created the data?  And if we can’t answers these questions, do we even know where to start looking?

When an organization doesn’t understand its multiple data touch points, it’s blindsided by events caused by the negative business impacts of poor data quality, e.g., a customer service nightmare, a regulatory compliance failure, or a financial reporting scandal.

“Follow the money” is an expression often used during the investigation of criminal activities or political corruption.  I remember the phrase from the 1976 Academy Award winning movie All the President’s Men, which was based on the non-fiction book of the same name written by Carl Bernstein and Bob Woodward, two of the journalists who investigated the Watergate scandal.

“Follow the data” is an expression sometimes used during the investigation of incidents of poor data quality.  However, it’s often limited to reactive data cleansing projects where the only remediation will be finding and fixing the critical data problems, but without taking corrective action to resolve the root cause—and in some cases, without even identifying the root cause.

A more proactive approach is establishing a formal process to follow the data from its inception and document every step of its journey throughout the organization, including the processes and people that the data encountered.  This makes it much easier to retrace data’s steps, recover more quickly when something goes awry, and prevent similar problems from recurring in the future.

Deep Throat told Woodward and Bernstein to: “Follow the money.”

Deep Thought told me 42 times to tell you to: Follow the data.”


Retroactive Data Quality

As I, and many others, have blogged about many times before, the proactive approach to data quality, i.e., defect prevention, is highly recommended over the reactive approach to data quality, i.e., data cleansing.

However, reactive data quality still remains the most common approach because “let’s wait and see if something bad happens” is typically much easier to sell strategically than “let’s try to predict the future by preventing something bad before it happens.”

Of course, when something bad does happen (and it always does), it is often too late to do anything about it.  So imagine if we could somehow travel back in time and prevent specific business-impacting occurrences of poor data quality from happening.

This would appear to be the best of both worlds since we could reactively wait and see if something bad happens, and if (when) it does, then we could travel back in time and proactively prevent just that particular bad thing from happening to our data quality.

This approach is known as Retroactive Data Quality—and it has been (somewhat successfully) implemented at least three times.

 

Flux Capacitated Data Quality

In 1985, Dr. Emmett “Doc” Brown turned a modified DeLorean DMC-12 into a retroactive data quality machine that when accelerated to 88 miles per hour, created a time displacement window using its flux capacitor (according to Doc it’s what makes time travel possible) powered by 1.21 gigawatts of electricity, which could be provided by either a nuclear reaction or a lightning strike.

On October 25, 1985, Doc sent data quality expert Marty McFly back in time to November 5, 1955 to prevent a few data defects in the original design of the flux capacitor, which inadvertently triggers some severe data defects in 2015, requiring Doc and Marty to travel back to 1955, then 1885, before traveling Back to the Future of a defect-free 1985—when the flux capacitor is destroyed.

 

Quantum Data Quality

In 1989, theorizing a data steward could time travel within his own database, Dr. Sam Beckett launched a retroactive data quality project called Quantum Data Quality, stepped into its Quantum Leap data accelerator—and vanished.

He awoke to find himself trapped in the past, stewarding data that was not his own, and driven by an unknown force to change data quality for the better.  His only guide on this journey was Al, a subject matter expert from his own time, who appeared in the form of a hologram only Sam could see and hear.  And so, Dr. Beckett found himself leaping from database to database, putting data right that once went wrong, and hoping each time that his next leap would be the leap home to his own database—but Sam never returned home.

 

Data Quality Slingshot Effect

The slingshot effect is caused by traveling in a starship at an extremely high warp factor toward a sun.  After allowing the gravitational pull to accelerate it to even faster speeds, the starship will then break away from the sun, which creates the so-called slingshot effect that transports the starship through time.

In 2267, Captain Gene Roddenberry will begin a Star Trek, commanding a starship using the slingshot effect to travel back in time to September 8, 1966 to launch a retroactive data quality initiative that has the following charter:

“Data: the final frontier.  These are the voyages of the starship Quality.  Its continuing mission: To explore strange, new databases; To seek out new data and new corporations; To boldly go where no data quality has gone before.”

 

Retroactive Data Quality Log, Supplemental

It is understandable if many of you doubt the viability of time travel as an approach to improving your data quality.  After all, whenever Doc and Marty, or Sam and Al, or Captain Roddenberry and the crew of the starship Quality, travel back in time and prevent specific business-impacting occurrences of poor data quality from happening, how do we prove they were successful?  Within the resulting altered timeline, there would be no traces of the data quality issues after they were retroactively resolved.

“Great Scott!”  It will always be more difficult to sell the business benefits of defect prevention, than the relative ease of selling data cleansing after a CxO responds “Oh, boy!” after the next time poor data quality negatively impacts business performance.

Nonetheless, you must continue your mission to engage your organization in a proactive approach to data quality.  “Make It So!”

 

Related Posts

Groundhog Data Quality Day

What Data Quality Technology Wants

To Our Data Perfectionists

Finding Data Quality

MacGyver: Data Governance and Duct Tape

What going to the dentist taught me about data quality

Microwavable Data Quality

A Tale of Two Q’s

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

730 Days and 264 Blog Posts Later . . .

 

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: OCDQ on Vimeo

 

Thank You

Thank you for reading my many musings on data quality and its related disciplines, and for tolerating my various references, from Adventures in Data Profiling to Social Karma, Shakespeare to Dr. Seuss, The Pirates of Penzance to The Rolling Stones, from The Three Musketeers to The Three Tweets, Dante Alighieri to Dumb and Dumber, Jack Bauer to Captain Jack Sparrow, Finding Data Quality to Discovering What Data Quality Technology Wants, and from Schrödinger’s Cat to the Buttered Cat.

Thank you for reading Obsessive-Compulsive Data Quality for the last two years.  Your readership is deeply appreciated.

 

Related Posts

OCDQ Blog Bicentennial – The 200th OCDQ Blog Post

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

Do you have obsessive-compulsive data quality? – The First OCDQ Blog Post

Twitter, Data Governance, and a #ButteredCat #FollowFriday

I have previously blogged in defense of Twitter, the pithy platform for social networking that I use perhaps a bit too frequently, and about which many people argue is incompatible with meaningful communication (Twitter that is, not me—hopefully).

Whether it is a regularly scheduled meeting of the minds, like the Data Knights Tweet Jam, or simply a spontaneous supply of trenchant thoughts, Twitter quite often facilitates discussions that deliver practical knowledge or thought-provoking theories.

However, occasionally the discussions center around more curious concepts, such as a paradox involving a buttered cat, which thankfully Steve Sarsfield, Mark Horseman, and Daragh O Brien can help me attempt to explain (remember I said attempt):

So, basically . . . successful data governance is all about Buttered Cats, Breaded CxOs, and Beer-Battered Data Quality Managers working together to deliver Bettered Data to the organization . . . yeah, that all sounded perfectly understandable to me.

But just in case you don’t have your secret decoder ring, let’s decipher the message (remember: “Be sure to drink your Ovaltine”):

  • Buttered Cats – metaphor for combining the top-down and bottom-up approaches to data governance
  • Breaded CxOs – metaphor for executive sponsors, especially ones providing bread (i.e., funding, not lunch—maybe both)
  • Beer-Battered Data Quality Managers – metaphor (and possibly also a recipe) for data stewardship
  • Bettered Data – metaphor for the corporate asset thingy that data governance helps you manage

(For more slightly less cryptic information, check out my previous post/poll: Data Governance and the Buttered Cat Paradox)

 

#FollowFriday Recommendations

Today is #FollowFriday, the day when Twitter users recommend other users you should follow, so here are some great tweeps for mostly non-buttered-cat tweets about Data Quality, Data Governance, Master Data Management, and Business Intelligence:

(Please Note: This is by no means a comprehensive list, is listed in no particular order whatsoever, and no offense is intended to any of my tweeps not listed below.  I hope that everyone has a great #FollowFriday and an even greater weekend.)

 

Related Posts

#FollowFriday Spotlight: @PhilSimon

#FollowFriday Spotlight: @hlsdk

#FollowFriday Spotlight: @DataQualityPro

#FollowFriday and The Three Tweets

Dilbert, Data Quality, Rabbits, and #FollowFriday

Twitter, Meaningful Conversations, and #FollowFriday

The Fellowship of #FollowFriday

The Wisdom of the Social Media Crowd

Social Karma (Part 7) – Twitter

Data Governance and the Buttered Cat Paradox

One of the most common questions about data governance is:

What is the best way to approach it—top-down or bottom-up?

The top-down approach is where executive sponsorship and the role of the data governance board is emphasized.

The bottom-up approach is where data stewardship and the role of peer-level data governance change agents is emphasized.

This debate reminds me of the buttered cat paradox (shown to the left as illustrated by Greg Williams), which is a thought experiment combining the two common adages: “cats always land on their feet” and “buttered toast always lands buttered side down.”

In other words, if you strapped buttered toast (butter side up) on the back of a cat and then dropped it from a high height (Please Note: this is only a thought experiment, so no cats or toast are harmed), presumably the very laws of physics would be suspended, leaving our fearless feline of the buttered-toast-paratrooper brigade hovering forever in midair, spinning in perpetual motion, as both the buttered side of the toast and the cat’s feet attempt to land on the ground.

It appears that the question of either a top-down or a bottom-up approach with data governance poses a similar paradox.

Data governance will require executive sponsorship and a data governance board for the top-down-driven activities of funding, policy making and enforcement, decision rights, and arbitration of conflicting business priorities as well as organizational politics.

However, data governance will also require data stewards and other grass roots advocates for the bottom-up-driven activities of policy implementation, data remediation, and process optimization, all led by the example of peer-level change agents adopting the organization’s new best practices for data quality management, business process management, and technology management.

Therefore, recognizing the eventual need for aspects of both a top-down and a bottom-up approach with data governance can leave an organization at a loss to understand where to begin, hovering forever in mid-decision, spinning in perpetual thought, unable to land a first footfall on their data governance journey—and afraid of falling flat on the buttered side of their toast.

Although data governance is not a thought experiment, planning and designing your data governance program does require thought, and perhaps some experimentation, in order to discover what will work best for your organization’s corporate culture.

What do you think is the best way to approach data governance? Please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

Thaler’s Apples and Data Quality Oranges

In the opening chapter of his book Carrots and Sticks, Ian Ayres recounts the story of Thaler’s Apples:

“The behavioral revolution in economics began in 1981 when Richard Thaler published a seven-page letter in a somewhat obscure economics journal, which posed a pretty simple choice about apples.

Which would you prefer:

(A) One apple in one year, or

(B) Two apples in one year plus one day?

This is a strange hypothetical—why would you have to wait a year to receive an apple?  But choosing is not very difficult; most people would choose to wait an extra day to double the size of their gift.

Thaler went on, however, to pose a second apple choice.

Which would you prefer:

(C) One apple today, or

(D) Two apples tomorrow?

What’s interesting is that many people give a different, seemingly inconsistent answer to this second question.  Many of the same people who are patient when asked to consider this choice a year in advance turn around and become impatient when the choice has immediate consequences—they prefer C over D.

What was revolutionary about his apple example is that it illustrated the plausibility of what behavioral economists call ‘time-inconsistent’ preferences.  Richard was centrally interested in the people who chose both B and C.  These people, who preferred two apples in the future but one apple today, flipped their preferences as the delivery date got closer.”

What does this have to do with data quality?  Give me a moment to finish eating my second apple, and then I will explain . . .

 

Data Quality Oranges

Let’s imagine that an orange represents a unit of measurement for data quality, somewhat analogous to data accuracy, such that the more data quality oranges you have, the better the quality of data is for your needs—let’s say for making a business decision.

Which would you prefer:

(A) One data quality orange in one month, or

(B) Two data quality oranges in one month plus one day?

(Please Note: Due to the strange uncertainties of fruit-based mathematics, two data quality oranges do not necessarily equate to a doubling of data accuracy, but two data quality oranges are certainly an improvement over one data quality orange).

Now, of course, on those rare occasions when you can afford to wait a month or so before making a critical business decision, most people would choose to wait an extra day in order to improve their data quality before making their data-driven decision.

However, let’s imagine you are feeling squeezed by a more pressing business decision—now which would you prefer:

(C) One data quality orange today, or

(D) Two data quality oranges tomorrow?

In my experience with data quality and business intelligence, most people prefer B over A—and C over D.

This “time-inconsistent” data quality preference within business intelligence reflects the reality that with the speed at which things change these days, more real-time business decisions are required—perhaps making speed more important than quality.

In a recent Data Knights Tweet Jam, Mark Lorion pondered speed versus quality within business intelligence, asking: “Is it better to be perfect in 30 days or 70% today?  Good enough may often be good enough.”

To which Henrik Liliendahl Sørensen responded with the perfectly pithy wisdom: “Good, Fast, Decision—Pick any two.”

However, Steve Dine cautioned that speed versus quality is decision dependent: “70% is good when deciding how many pencils to order, but maybe not for a one billion dollar acquisition.”

Mark’s follow-up captured the speed versus quality tradeoff succinctly with “Good Now versus Great Later.”  And Henrik added the excellent cautionary note: “Good decision now, great decision too late—especially if data quality is not a mature discipline.”

 

What Say You?

How many data quality oranges do you think it takes?  Or for those who prefer a less fruitful phrasing, where do you stand on the speed versus quality debate?  How good does data quality have to be in order to make a good data-driven business decision?

 

Related Posts

To Our Data Perfectionists

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

Data Quality and the Cupertino Effect

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Data In, Decision Out

The Data-Decision Symphony

Data!

You Can’t Always Get the Data You Want

Data Qualia

In philosophy (according to Wikipedia), the term qualia is used to describe the subjective quality of conscious experience.

Examples of qualia are the pain of a headache, the taste of wine, or the redness of an evening sky.  As Daniel Dennett explains:

“Qualia is an unfamiliar term for something that could not be more familiar to each of us:

The ways things seem to us.”

Like truth, beauty, and singing ability, data quality is in the eyes of the beholder, or since data quality is most commonly defined as fitness for the purpose of use, we could say that data quality is in the eyes of the user.

However, most data has both multiple uses and multiple users.  Data of sufficient quality for one use or one user may not be of sufficient quality for other uses and other users.  Quite often these diverse data needs and divergent data quality perspectives make it a daunting challenge to provide meaningful data quality metrics to the organization.

Recently on the Data Roundtable, Dylan Jones of Data Quality Pro discussed the need to create data quality reports that matter, explaining that if you’re relying on canned data profiling reports (i.e., column statistics and data quality metrics at an attribute, table, and system level), then you are measuring data quality in isolation of how the business is performing.

Instead, data quality metrics must measure data qualia—the subjective quality of the user’s business experience with data:

“Data Qualia is an unfamiliar term for something that must become more familiar to the organization:

The ways data quality impact business performance.”

Related Posts

The Point of View Paradox

DQ-BE: Single Version of the Time

Single Version of the Truth

Beyond a “Single Version of the Truth”

The Idea of Order in Data

Hell is other people’s data

DQ-BE: Data Quality Airlines

DQ-Tip: “There is no such thing as data accuracy...”

Data Quality and the Cupertino Effect

DQ-Tip: “Data quality is primarily about context not accuracy...”