April 18, 2011

How active is your data quality practice?

April 18, 2011/ Jim Harris

My recent blog post The Data Quality Wager received a provocative comment from Richard Ordowich that sparked another round of discussion and debate about proactive data quality versus reactive data quality in the LinkedIn Group for the IAIDQ.

“Data quality is a reactive practice,” explained Ordowich. “Perhaps that is not what is professed in the musings of others or the desired outcome, but it is nevertheless the current state of the best practices. Data profiling and data cleansing are after the fact data quality practices. The data is already defective. Proactive defect prevention requires a greater discipline and changes to organizational behavior that is not part of the current best practices. This I suggest is wishful thinking at this point in time.”

“How can data quality practices,” C. Lwanga Yonke responded, “that do not include proactive defect prevention (with the required discipline and changes to organizational behavior) be considered best practices? Seems to me a data quality program must include these proactive activities to be considered a best practice. And from what I see, there are many such programs out there. True, they are not the majority—but they do exist.”

After Ordowich requested real examples of proactive data quality practices, Jayson Alayay commented “I have implemented data quality using statistical process control techniques where expected volumes and ratios are predicted using forecasting models that self-adjust using historical trends. We receive an alert when significant deviations from forecast are detected. One of our overarching data quality goals is to detect a significant data issue as soon as it becomes detectable in the system.”

“It is possible,” replied Ordowich, “to estimate the probability of data errors in data sets based on the currency (freshness) and usage of the data. The problem is this process does not identify the specific instances of errors just the probability that an error may exist in the data set. These techniques only identify trends not specific instances of errors. These techniques do not predict the probability of a single instance data error that can wreak havoc. For example, the ratings of mortgages was a systemic problem, which data quality did not address. Yet the consequences were far and wide. Also these techniques do not predict systemic quality problems related to business policies and processes. As a result, their direct impact on the business is limited.”

“For as long as human hands key in data,” responded Alayay, “a data quality implementation to a great extent will be reactive. Improving data quality not only pertains to detection of defects, but also enhancement of content, e.g., address standardization, geocoding, application of rules and assumptions to replace missing values, etc. With so many factors in play, a real life example of a proactive data quality implementation that suits what you’re asking for may be hard to pinpoint. My opinion is that the implementation of ‘comprehensive’ data quality programs can have big rewards and big risks. One big risk is that it can slow time-to-market and kill innovation because otherwise talented people would be spending a significant amount of their time complying with rules and standards in the name of improving data quality.”

“When an organization embarks on a new project,” replied Ordowich, “at what point in the conversation is data quality discussed? How many marketing plans, new product development plans, or even software development plans have you seen include data quality? Data quality is not even an afterthought in most organizations, it is ignored. Data quality is not even in the vocabulary until a problem occurs. Data quality is not part of the culture or behaviors within most organizations.”

Please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

A Tale of Two Q’s

What going to the dentist taught me about data quality

Groundhog Data Quality Day

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

What Data Quality Technology Wants

MacGyver: Data Governance and Duct Tape

To Our Data Perfectionists

Finding Data Quality

Retroactive Data Quality

March 08, 2011

Data Governance and the Buttered Cat Paradox

March 08, 2011/ Jim Harris

One of the most common questions about data governance is:

What is the best way to approach it—top-down or bottom-up?

The top-down approach is where executive sponsorship and the role of the data governance board is emphasized.

The bottom-up approach is where data stewardship and the role of peer-level data governance change agents is emphasized.

This debate reminds me of the buttered cat paradox (shown to the left as illustrated by Greg Williams), which is a thought experiment combining the two common adages: “cats always land on their feet” and “buttered toast always lands buttered side down.”

In other words, if you strapped buttered toast (butter side up) on the back of a cat and then dropped it from a high height (Please Note: this is only a thought experiment, so no cats or toast are harmed), presumably the very laws of physics would be suspended, leaving our fearless feline of the buttered-toast-paratrooper brigade hovering forever in midair, spinning in perpetual motion, as both the buttered side of the toast and the cat’s feet attempt to land on the ground.

It appears that the question of either a top-down or a bottom-up approach with data governance poses a similar paradox.

Data governance will require executive sponsorship and a data governance board for the top-down-driven activities of funding, policy making and enforcement, decision rights, and arbitration of conflicting business priorities as well as organizational politics.

However, data governance will also require data stewards and other grass roots advocates for the bottom-up-driven activities of policy implementation, data remediation, and process optimization, all led by the example of peer-level change agents adopting the organization’s new best practices for data quality management, business process management, and technology management.

Therefore, recognizing the eventual need for aspects of both a top-down and a bottom-up approach with data governance can leave an organization at a loss to understand where to begin, hovering forever in mid-decision, spinning in perpetual thought, unable to land a first footfall on their data governance journey—and afraid of falling flat on the buttered side of their toast.

Although data governance is not a thought experiment, planning and designing your data governance program does require thought, and perhaps some experimentation, in order to discover what will work best for your organization’s corporate culture.

What do you think is the best way to approach data governance? Please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

December 21, 2010

What Does Data Quality Technology Want?

December 21, 2010/ Jim Harris

During a recent Radiolab podcast, Kevin Kelly, author of the book What Technology Wants, used the analogy of how a flower leans toward sunlight because it “wants” the sunlight, to describe what the interweaving web of evolving technical innovations (what he refers to as the super-organism of technology) is leaning toward—in other words, what technology wants.

The other Radiolab guest was Steven Johnson, author of the book Where Good Ideas Come From, who somewhat dispelled the traditional notion of the eureka effect by explaining that the evolution of ideas, like all evolution, stumbles its way toward the next good idea, which inevitably leads to a significant breakthrough, such as what happens with innovations in technology.

Listening to this thought-provoking podcast made me ponder the question: What does data quality technology want?

In a previous post, I used the term OOBE-DQ to refer to the out-of-box-experience (OOBE) provided by data quality (DQ) tools, which usually becomes a debate between “ease of use” and “powerful functionality” after you ignore the Magic Beans sales pitch that guarantees you the data quality tool is both remarkably easy to use and incredibly powerful.

The data quality market continues to evolve away from esoteric technical tools and stumble its way toward the next good idea, which is business-empowering suites providing robust functionality with increasingly role-based user interfaces, which are tailored to the specific needs of different users. Of course, many vendors would love to claim sole responsibility for what they would call significant innovations in data quality technology, instead of what are simply by-products of an evolving market.

The deployment of data quality functionality within and across organizations also continues to evolve, as data cleansing activities are being complemented by real-time defect prevention services used to greatly minimize poor data quality at the multiple points of origin within the enterprise data ecosystem.

However, viewpoints about the role of data quality technology generally remain split between two opposing perspectives:

Technology enables a data quality process, but doesn’t obviate the need for people (e.g., data stewards) to remain actively involved and be held accountable for maintaining the quality of data.
Technology automates a data quality process, and a well-designed and properly implemented technical solution obviates the need for people to be actively involved after its implementation.

Do you think that continuing advancements and innovations in data quality technology will obviate the need for people to be actively involved in data quality processes? In the future, will we have high quality data because our technology essentially wants it and therefore leans our organizations toward high quality data? Let’s conduct another unscientific data quality poll:

Additionally, please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

DQ-Tip: “Data quality tools do not solve data quality problems...”

Which came first, the Data Quality Tool or the Business Need?

Data Quality Industry: Problem Solvers or Enablers?

Data Quality Magic

The Tooth Fairy of Data Quality

Data Quality is not a Magic Trick

Do you believe in Magic (Quadrants)?

Pirates of the Computer: The Curse of the Poor Data Quality

November 11, 2010

DQ-Poll: Data Warehouse or Data Outhouse?

November 11, 2010/ Jim Harris

In many organizations, a data warehouse is the only system where data from numerous and disparate operational sources has been integrated into a single repository of enterprise data.

The rapid delivery of a single system of record containing fully integrated and historical data to be used as the source for most of the enterprise’s reporting and decision support needs has long been the rallying cry and promise of the data warehouse.

However, I have witnessed beautifully architected, elegantly implemented, and diligently maintained data warehouses simply get ignored by the organization, which continues to rely on its data silos and spreadsheets for reporting and decision making.

The most common reason is that these big boxes of data are often built with little focus on the quality of the data being delivered.

But that’s just my opinion based on my personal experience. So let’s conduct an unscientific poll.

Additionally, please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

OCDQ Blog

OCDQ Blog

OCDQ Blog

OCDQ Blog

How active is your data quality practice?

Related Posts

Data Governance and the Buttered Cat Paradox

What Does Data Quality Technology Want?

Related Posts

DQ-Poll: Data Warehouse or Data Outhouse?

OCDQ Blog