Data, data everywhere, but where is data quality?

“Two young fish are swimming along when they happen to meet an older fish swimming the other way, who nods at them and says: ‘Morning, boys.  How’s the water?’  And the two young fish swim on for a bit, then eventually one of them looks over at the other and goes: ‘What the hell is water?’”

The acclaimed novelist David Foster Wallace told that story during a speech he delivered at Kenyon College in 2005.  Although he certainly wasn’t speaking on the topic of data management, I believe that his story can easily be adapted as a data metaphor:

“Two young kids are walking along, tweeting and uploading new pictures to Facebook on their iPhones, when they happen to meet an older man walking the other way checking his work e-mail on his BlackBerry, who nods at them and says: ‘Morning, boys.  How’s the data?’  And the two young kids walk on for a bit, then eventually one of them looks over at the other and goes: ‘What the hell is data?’”

My point is that what once was a seemingly esoteric word (“data”) used mostly by computer geeks such as myself, has now so thoroughly pervaded mainstream culture that we hardly seem to notice we are quite literally swimming in data on a daily basis.


Why Data Matters

In his recent blog post, Rich Murnane was hit with the realization that data isn’t for data geeks anymore.  The post included an excellent IBM video (and commercial) about “Why Data Matters” that states every day we are creating fifteen petabytes of data, which is eight times as much data as there is in all of the libraries in the United States combined. 

Data matters because everything—and not just the rows in our relational databases and spreadsheets, but also our status updates from Facebook and Twitter, our blog posts, and even most of our daily conversations—is data. 

The growing challenge is can we extract meaningful insights from these vast and veritable oceans of unrelenting data volumes, and use those insights to make better decisions in near real-time in order to positively impact the various aspects of our lives.


Paradoxical Business Situation

Even in the business world, where data management used to be viewed solely as a concern for those computer geeks down in IT, more and more people all throughout more and more organizations are coming to view data as a strategic corporate asset.

In his recent Network World article Data Everywhere, But Not Enough Smart Management, Thomas Wailgum described the “data, data everywhere” phenomenon as “an awe-inspiring and unprecedented push and pull of data and information needs.”  Wailgum described the push as a growing surge of terabytes of data flooding enterprise systems and applications, and the pull as the growing demand from users for sweeping, individualized access to analytics and business information.

However, just because data is flowing everywhere doesn’t automatically mean that data quality is sure to follow.

Wailgum cites research from a recent Forbes survey where executives reported that the “bad data problem” is currently estimated to be costing their organizations between five and twenty million dollars annually, which leads him to ask the question:

“If everyone agrees on the strategic importance of data and information management, and everyone knows what the negative consequences are, then why are there still so many problems?” 

Wailgum calls this the “paradoxical business situation” and cited survey results indicating “fragmented data ownership” is the single biggest roadblock to successful enterprise information management.  Nearly 80% of IT managers said data quality was their responsibility, whereas nearly 75% of business (finance, sales, and marketing) managers said it was their responsibility.

“While IT managers largely concede that information is the users’ not theirs, they take the position that data and information management systems are under IT’s purview,” concludes the survey.  “This differing perspective puts IT and business executives in conflicting camps, particularly when it comes to data quality.”

This debate over data ownership reminded me of the great discussion sparked by a recent Henrik Liliendahl Sørensen blog post questioning whether “data owner” was a bad word.  Many commenters agreed that “data stewardship” was more relevant and that although data quality is a shared responsibility for the entire enterprise, corporate culture is far more challenging than what can amount to a largely semantic argument over the proper use of terminology such as “data ownership” or “ data stewardship.”


Why Data Quality Matters

As I posited in The Circle of Quality, an organization’s success is measured by the quality of its results, which are dependent on the quality of its business decisions, which rely on the quality of its information, which is based on the quality of its data. 

Therefore, data quality matters because high quality data serves as a solid foundation for business success.

Organizations are not only facing the challenging realities that data is everywhere and its burgeoning volumes continue to rise, but also that data is no longer limited to the traditional structured forms stored in relational databases.  Unstructured data from social media, the Internet, and mobile devices are contributing an abundant new source to the enterprise’s information ocean.

In The Rime of the Ancient Mariner, Samuel Taylor Coleridge wrote:

“Day after day, day after day,
We stuck, nor breath nor motion;
As idle as a painted ship
Upon a painted ocean.

Water, water, everywhere,
And all the boards did shrink;
Water, water, everywhere,
Nor any drop to drink.”

When data is abundant, but data quality remains scarce, then the thirst to acquire knowledge and insight remains unquenched, and data hangs like a heavy albatross around the enterprise’s neck.


Related Posts

The Circle of Quality

Beyond a “Single Version of the Truth”

Poor Data Quality is a Virus

DQ-Tip: “Don't pass bad data on to the next person...”

The Only Thing Necessary for Poor Data Quality

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

Data Governance and Data Quality

The Data-Information Continuum


Follow OCDQ

If you enjoyed this blog post, then please subscribe to OCDQ via my RSS feed, my E-mail updates, or Google Reader.

You can also follow OCDQ on Twitter, fan the Facebook page for OCDQ, and connect with me on LinkedIn.