Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments
« DQ-View: Is Data Quality the Sun? | Main | DQ-View: Designated Asker of Stupid Questions »
Tuesday
Jul272010

Is your data complete and accurate, but useless to your business?

Ensuring that complete and accurate data is being used to make critical daily business decisions is perhaps the primary reason why data quality is so vitally important to the success of your organization. 

However, this effort can sometimes take on a life of its own, where achieving complete and accurate data is allowed to become the raison d'être of your data management strategy—in other words, you start managing data for the sake of managing data.

When this phantom menace clouds your judgment, your data might be complete and accurate—but useless to your business.

 

Completeness and Accuracy

How much data is necessary to make an effective business decision?  Having complete (i.e., all available) data seems obviously preferable to incomplete data.  However, with data volumes always burgeoning, the unavoidable fact is that sometimes having more data only adds confusion instead of clarity, thereby becoming a distraction instead of helping you make a better decision.

Returning to my original question, how much data is really necessary to make an effective business decision? 

Accuracy, which, thanks to substantial assistance from my readers, was defined in a previous post as both the correctness of a data value within a limited context such as verification by an authoritative reference (i.e., validity) combined with the correctness of a valid data value within an extensive context including other data as well as business processes (i.e., accuracy). 

Although accurate data is obviously preferable to inaccurate data, less than perfect data quality can not be used as an excuse to delay making a critical business decision.  When it comes to the quality of the data being used to make these business decisions, you can’t always get the data you want, but if you try sometimes, you just might find, you get the business insight you need.

 

Data-driven Solutions for Business Problems

Obviously, there are even more dimensions of data quality beyond completeness and accuracy. 

However, although it’s about more than just improving your data, data quality can be misperceived to be an activity performed just for the sake of the data.  When, in fact, data quality is an enterprise-wide initiative performed for the sake of implementing data-driven solutions for business problems, enabling better business decisions, and delivering optimal business performance.

In order to accomplish these objectives, data has to be not only complete and accurate, as well as whatever other dimensions you wish to add to your complete and accurate definition of data quality, but most important, data has to be useful to the business.

Perhaps the most common definition for data quality is “fitness for the purpose of use.” 

The missing word, which makes this definition both incomplete and inaccurate, puns intended, is “business.”  In other words, data quality is “fitness for the purpose of business use.”  How complete and how accurate (and however else) the data needs to be is determined by its business use—or uses since, in the vast majority of cases, data has multiple business uses.

 

Data, data everywhere

With silos replicating data as well as new data being created daily, managing all of the data is not only becoming impractical, but because we are too busy with the activity of trying to manage all of it, no one is stopping to evaluate usage or business relevance.

The fifth of the Five New Ideas From 2010 MIT Information Quality Industry Symposium, which is a recent blog post written by Mark Goloboy, was that “60-90% of operational data is valueless.”

“I won’t say worthless,” Goloboy clarified, “since there is some operational necessity to the transactional systems that created it, but valueless from an analytic perspective.  Data only has value, and is only worth passing through to the Data Warehouse if it can be directly used for analysis and reporting.  No news on that front, but it’s been more of the focus since the proliferation of data has started an increasing trend in storage spend.”

In his recent blog post Are You Afraid to Say Goodbye to Your Data?, Dylan Jones discussed the critical importance of designing an archive strategy for data, as opposed to the default position many organizations take, where burgeoning data volumes are allowed to proliferate because, in large part, no one wants to delete (or, at the very least, archive) any of the existing data. 

This often results in the data that the organization truly needs for continued success getting stuck in the long line of data waiting to be managed, and in many cases, behind data for which the organization no longer has any business use (and perhaps never even had the chance to use when the data was actually needed to make critical business decisions).

“When identifying data in scope for a migration,” Dylan advised, “I typically start from the premise that ALL data is out of scope unless someone can justify its existence.  This forces the emphasis back on the business to justify their use of the data.”

 

Data Memorioso

Funes el memorioso is a short story by Jorge Luis Borges, which describes a young man named Ireneo Funes who, as a result of a horseback riding accident, has lost his ability to forget.  Although Funes has a tremendous memory, he is so lost in the details of everything he knows that he is unable to convert the information into knowledge and unable, as a result, to grow in wisdom.

In Spanish, the word memorioso means “having a vast memory.”  When Data Memorioso is your data management strategy, your organization becomes so lost in all of the data it manages that it is unable to convert data into business insight and unable, as a result, to survive and thrive in today’s highly competitive and rapidly evolving marketplace.

In their great book Made to Stick: Why Some Ideas Survive and Others Die, Chip Heath and Dan Heath explained that “an accurate but useless idea is still useless.  If a message can’t be used to make predictions or decisions, it is without value, no matter how accurate or comprehensive it is.”  I believe that this is also true for your data and your organization’s business uses for it.

Is your data complete and accurate, but useless to your business?

 

Related Posts

Data Quality and the Cupertino Effect

Data Rock Stars: The Rolling Forecasts

Data!

Data, data everywhere, but where is data quality?

DQ-Tip: “There is no point in monitoring data quality…”

DQ-Tip: “Data quality is about more than just improving your data...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

The First Law of Data Quality

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (16)

Great post!

This is a very important aspect of knowledge management.

Data has an expiration date, and just adding data "for the sake of it" increases information overload and makes it harder to find the really important data.

My article on KM, the value of data/information/knowledge etc: KM 3.0: This time it's personal

July 27, 2010 | Unregistered CommenterAtle Iversen

Good post, Jim.

Funny enough, I just added Made to Stick to my list of books to read.

I'll also cop to learning a new Spanish word from your post.

July 27, 2010 | Unregistered CommenterPhil Simon

Nice post Jim and thanks for the link.

This sparks an episode I had a few years ago with an engineering services company in the UK.

I ran a management workshop showing a lot of the issues we had uncovered. As we were walking through a dashboard of all the findings one of the directors shouted out that the 20% completeness stats for a piece of engineering installation data was wrong, she had received no reports of missing data.

I drilled into the raw data and sure enough we found that 80% of the data was incomplete.

She was furious and demanded that site visits be carried out and engineers should be incentivised (i.e., punished!) in order to maintain this information.

What was interesting is that the data went back many years so I posed the question - has your decision-making ability been impeded by this lack of information?

What followed was a lengthy debate but the outcome was NO, it had little effect on operations or strategic decision making.

The company could have invested considerable amounts of time and money in maintaining this information but the benefits would have been marginal.

One of the most important dimensions to add to any data quality assessment is USEFULNESS, I use that as a weight to reduce the impact of other dimensions. To extend your debate further, data may be hopelessly inaccurate and incomplete, but if it's of no use, then let's take it out of the equation.

Great post, like the other links you've dropped in here, really extends the debate.

July 27, 2010 | Unregistered CommenterDylan Jones

Here at Experian QAS we help businesses avoid falling into that exact trap.

An effective data quality strategy should always start with the end goal in mind - what will the data ultimately be used for?

We've produced a huge range of resources to help anyone wrestling with the question of 'what should data quality look like for my particular business'?

Helen Roy
Marketing Manager
Experian QAS

July 27, 2010 | Unregistered CommenterHelen Roy

@Atle — Similar to radioactive elements, all data has a limited shelf life. All data decays, but not necessarily at the same rate. There are many different dates associated with data. Knowing accurate creation, update, effective, expiration, and other available dates can help estimate the timeframe that data will be applicable for its intended usage. Knowing its shelf life can be used to indicate when data should be archived or possibly even deleted.

@Phil — I high recommend both of the Heath brothers books, Made to Stick and Switch. In my opinion, these are two of the most important recently published business books.

@Dylan — Thanks for sharing a great real-world example. “Has your decision-making ability been impeded by this lack of information?” should be a standard question for all business stakeholders. And I definitely agree that USEFULNESS is one of the most important dimensions to add to any data quality assessment.

@Helen — “What will the data ultimately be used for?” is definitely the question that drives an effective data quality strategy. How data is being used is more important than the business processes that create it and the technical processes that manage it.

July 27, 2010 | Registered CommenterJim Harris

From the LinkedIn Group for TDWI, Rohit Raghav commented:

“Good thread, Jim.

Besides being useless to the business, it also brings extensive costs in form of archiving, securing and maintaining, and business risks arising out of security breaches. Considering the widely varied regulatory regimes, it is extremely challenging to put a value to potential losses.

Making Data Destruction as one of the processes in Information Management should be on the priority list of CIOs - though challenges in executing such plans would be stiff and difficult to surmount given the increasingly complex ways in which organizations are structured within and connected to the world outside.”

And I responded:

Thanks for your insightful comment, Rohit.

I definitely agree that Data Destruction should be an Information Management priority for CIOs. I also agree with your points on why doing so is indeed a complex challenge.

Best Regards,

Jim

July 27, 2010 | Registered CommenterJim Harris

In early July, Evan Levy of Baseline Consulting wrote an excellent blog post, The Flaw of the Data Inventory, which I had included in the original draft of my post as an example of an approach for identifying the data that the business truly uses.

The blog post included a slight allusion to the John Keats poem Ode on a Grecian Urn, which (of course) made me leave the following comment (somehow not rejected as spam) on Evan's blog post:

Oh, still unidentified inventory of data assets!
You are the foster-child of silence and slow time.
Historians, who cannot thus express current business issues,
Offer instead, a flowery tale flowing more sweetly than a rhyme.
What leafy pages of dead trees will haunt the inventory,
Of structured or unstructured data sources, or of both,
Locked in metadata repositories or drawers of old file cabinets?

What inconsistent and unknowing knowledge are these?
What mad pursuit of wasted wisdom we struggle to escape,
What countless discrete data elements? What wild SQL query?

All data plays a melody that is sweet, but those truly in use
Are sweeter; therefore, data inventory, catalog data assets,
Not for the sake of it, but for something more endeared,
Catalog what the business truly uses and needs, but no more.

When it is complete, with repeatable process and no waste,
Your data inventory shall remain, even in midst of other woes,
A true friend to one and all, and to whom you shall say:

"Business insight is truth, truth business insight," – that is all
You know of your data assets, and all you need to know.

July 27, 2010 | Registered CommenterJim Harris

From the LinkedIn Data Cleansing User Group, Gordon Hamilton commented:

“That's an excellent point Jim.

Data Quality dimensions that track an information data set's significance to the business such as Relevance or Impact could help keep the care and feeding efforts for each data set in ratio to their importance to the Business.

I think you are suggesting that the Business's strategic/tactical objectives should be used to self-assess and even prune data quality management efforts, in order to keep them aligned with the Business rather than letting them have an independent life of their own.

I wonder if all business activities could use a self-assessment metric built in to their processing so that they can realign to reality. In the low levels of biology this is sometimes referred to as a suicide gene that lets a cell decide when it is no longer needed. Suicide is such a strong term though, maybe it could be called an: annual review to realign efforts to organizational goals gene.”

And I responded:

Thanks for your excellent comment, Gordon.

The strategic/tactical objectives of the Business should definitely be used to self-assess data management efforts, in order to keep them aligned with the Business rather than letting them have an independent life of their own.

Great genetic analogy!

Best Regards,

Jim

July 28, 2010 | Registered CommenterJim Harris

From the LinkedIn Group for Data Governance & Data Quality,

Harsha Srivatsa commented:

“That is an interesting observation and along the lines of what I have talking about lately with my clients.

Do we need to look beyond Data Quality programs? Do we need to look beyond Data Quality metrics? Does metadata (technical and business metadata) actually help with the data usability?

Does quality data actually becomes serving only when business process, applications and business users actually recognize, start to use the data and touching the data in such a way that the quality and adherence to business rules are still maintained?

Does the usability becomes apparent when data becomes "information"?

The point I am trying to say is that maybe it is time to look at a factor called "Data Usability" whatever that connotes.

Thoughts? Feedback?”

And David Schiller responded:

“This is exactly where data governance comes into play. Data quality improvement programs cannot exist in a vacuum...that is unless you want to avoid this exact issue.

Any DQ initiative must be tied to business value, otherwise it is an almost useless exercise. That is the whole reason for improving the quality of data...so it can be used to make business decisions with confidence and trust....one could call this "usability". And yes, of course metadata helps with the usability aspect, but from a DQ perspective it doesn't matter what the metadata says it is (or should be)...This is why you need stewardship involved in the DQ process.

Therefore, unless your DQ program is aligned to business metrics, business performance improvement, business value, etc. it seems like a waste of effort.”

And Harsha Srivatsa responded:

“David, I totally agree with your viewpoint. I have to since that is how I earn my paycheck.

However, in dealing with the realities of establishing Data Governance and Data Quality, I would like to share some on the ground experiences.

I believe that drivers for Data Management come about due to complaints from end application users. End users start to complain that they can't find the data (even though it exists somewhere) or they can't use the data in the form that they are presented to then.

Data to begin with (however flawed it may be) is usable since somebody decided to create it or do something with it but along the way it just becomes bad, transformed in unexpected ways, deprecated, stale etc. I am not sure yet if this usability is more from an end user perspective or from an applications perspective.

I can understand that if existing data is not used for the right reasons (lack of quality, governance and definitions).

However, it may not be acceptable to stakeholders if data cannot be used in spite of propounding and implementing Data Quality, Data Governance, Data Dictionaries etc. Maybe it is worthwhile to examine how and when data becomes information, when information becomes insight and insight becomes knowledge.”


And I responded:

Thanks Harsha and David for sharing your excellent insights!

Best Regards,

Jim

July 28, 2010 | Registered CommenterJim Harris

Great points Jim. We have traditionally, as I am sure most organization have, struggled with how much is enough and how accurate does it have to be for decision making. Throw in data being used for multiple purposes (regulatory, analytics or planning) and each use has different needs as well. Hard to find the correct balance but therein lies the challenge!

July 28, 2010 | Unregistered Commenterwjdataguy

@Chris (aka wjdataguy) — Yes, most organizations definitely struggle with determining how much data is enough and how accurate does it have to be to truly support their decision making. As you said, the challenge is that it's hard to find the correct balance, which makes too many organizations adopt a default strategy of managing all of the data and trying to achieve near perfect data quality. And one way or another, that approach will eventually fail.


From the SmartData Collective, James Taylor commented:

“Great post Jim.

This is what I call "Beginning with the decision in mind". If companies try and integrate or clean their data without understanding the decisions they are trying to improve then they will likely waste at least some of their effort.

Better to identify the decisions that matter to the business (through the metrics that the business tracks), understand what a better decision would be, figure out what kind of analytics would help deliver a better decision and only then go find, integrate and clean the data.”

And I responded:

Thanks for your insightful comment, James.

"Beginning with the decision in mind" is a great quote!

Too many organizations seem to take "beginning with the data we have" approach and thereby start down a path that will usually waste a lot of time and money, but provide very little business insight or decision support.

Best Regards,

Jim

July 29, 2010 | Registered CommenterJim Harris

Excellent post, Jim, as usual.

A particularly nasty problem in data management is that data created for one purpose gets used for another. Often, the people who use the data don't have a choice. It's the only data available!

And when the same piece of data is used for multiple purposes, it gets even tougher. As you said, completeness and accuracy has a context: the same piece of data could be good for one purpose and useless for another.

A major goal of data governance is to define and enforce policies that aligns how data is created with how data is used. And if conflicts arise -- they surely will -- there's a mechanism for resolving them.

Best Regards,

Winston

July 29, 2010 | Unregistered CommenterWinston Chen

@Winston — Yes, it would be a much, much simpler data management world if data only had one purpose. The harsh reality of multiple business uses for the same data is indeed a particularly nasty problem for data management, and I agree data governance can definitely help with aligning how data is created with how data is used.

In his guest post on your Kalido blog, David Loshin offered some great advice about Avoiding the Data Governance Gap, which he defined as the delta between intention and action when defining data governance policies.

July 29, 2010 | Registered CommenterJim Harris

Great post Jim and some really excellent comments!

I would like to add to Gord, James and Winston's comments and suggest that in order to ensure the data meets the needs of the business, the data should be linked directly to business objectives.

This helps to eliminate the redundant data, makes the business requirements more solid, can identify areas of conflict where multiple business units use it differently for different purposes (assuming you have a method for escalation) and I have found that it can also identify data, processes, and activities that provide no clear business benefit.

Thank you!

August 5, 2010 | Unregistered CommenterJill Wanless

From the LinkedIn Group for DAMA International, Deborah Arline commented:

“Amen! I am interviewing information architects at large Fortune 500 companies that proclaim to have a data governance program, but most of us architects are frustrated with how data governance is treated as a ‘project’ with spits and starts. The business makes about 40% of critical decisions using dirty data, and they seem complacent about how skewed it may be. Being ‘complete and accurate’ are only 2 out of about 16 measures for data quality.

The following is excerpted from Data Warehouse: Practical Advice from the Experts by Joyce Bischoff and Ted Alexander. The top 2 indicators of quality data are:

1. The user is satisfied with the quality of the data and the information derived from that data - While this is a subjective measure, it is, arguably, the most important indicator of all. If the data is of high quality, but the user is still dissatisfied, you or your boss will be out of a job.

2. The data is relevant and satisfies the needs of the business - The data has value to the enterprise. High quality data is useless if it's not the data needed to run the business.

What company today even thinks their data is complete and accurate? If they think so, does the business really know their data? Having complete and accurate data is an extremely rare exception, if it exists at all. As information managers, our top 2 priorities are to ensure that the data meets business requirements and that the business is happy. It is also our job to help the business determine what they need in a timely fashion, when they don’t have a clue how bad the data quality really is. We must help the business to manage all aspects of information and data quality. If the data and information is useless to the business, how useful are we to them? In that event, we may be in the wrong job and just haven’t received our termination notice yet.”

And I responded:

Thank you very much for your detailed comment, Deborah!

I definitely agree that few, if any, organizations have data that is complete and accurate, and that there are many additional measures for data quality.

Although it is perhaps more common to ignore data quality than attempt to perfect it, I have seen too many organizations that, with the very best of intentions, hyper-focus on the data, letting managing data become an end in itself, and ignore the two excellent, and business-driven, indicators of high quality data that you cited.

Best Regards,

Jim

August 14, 2010 | Registered CommenterJim Harris

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>