Wednesday Word: August 11, 2010

Wednesday Word is an OCDQ regular segment intended to provide an occasional alternative to my Wordless Wednesday posts.  Wednesday Word provides a word (or words) of the day, including both my definition and an example of recommended usage.

 

Quality-ish

Truthiness by Stephen Colbert

Definition – Similar to truthiness, which my mentor Sir Dr. Stephen T. Colbert, D.F.A. defines as “truth that a person claims to know intuitively from the gut without regard to evidence, logic, intellectual examination, or facts,” quality-ish is defined as the quality of the data that an organization is using as the basis to make its critical business decisions without regard to performing data analysis, measuring completeness and accuracy, or even establishing if the data has any relevance at all to the critical business decisions being based upon it.

Example – “At today’s press conference, the CIO of Acme Marketplace Analytics heralded data-driven decision-making as the company’s key competitive differentiator.  In related news, the stock price of Acme Marketplace Analytics fell to a record low after their new quality-ish report declared the obsolesce of iTunes based on the latest Betamax videocassette sales projections.”

 

Is your organization basing its critical business decisions upon high quality data or highly quality-ish data?

 

Related Posts

The Circle of Quality

Is your data complete and accurate, but useless to your business?

Finding Data Quality

The Dumb and Dumber Guide to Data Quality

Wednesday Word: June 23, 2010 – Referential Narcissisity

Wednesday Word: June 9, 2010 – C.O.E.R.C.E.

Wednesday Word: April 28, 2010 – Antidisillusionmentarianism

Wednesday Word: April 21, 2010 – Enterpricification

Wednesday Word: April 7, 2010 – Vendor Asskisstic

Which came first, the Data Quality Tool or the Business Need?

This recent tweet by Andy Bitterer of Gartner Research (and ANALYSTerical) sparked an interesting online discussion, which was vaguely reminiscent of the classic causality dilemma that is commonly stated as “which came first, the chicken or the egg?”

 

An E-mail from the Edge

On the same day I saw Andy’s tweet, I received an e-mail from a friend and fellow data quality consultant, who had just finished a master data management (MDM) and enterprise data warehouse (EDW) project, which had over 20 customer data sources.

Although he was brought onto the project specifically for data cleansing, he was told from the day of his arrival that because of time constraints, they decided against performing any data cleansing with their recently purchased data quality tool.  Instead, they decided to use their data integration tool to simply perform the massive initial load into their new MDM hub and EDW.

But wait—the story gets even better.  The very first decision this client made was to purchase a consolidated enterprise application development platform with seamlessly integrated components for data quality, data integration, and master data management.

So long before this client had determined their business need, they decided that they needed to build a new MDM hub and EDW, made a huge investment in an entire platform of technology, then decided to use only the basic data integration functionality. 

However, this client was planning to use the real-time data quality and MDM services provided by their very powerful enterprise application development platform to prevent duplicates and any other bad data from entering the system after the initial load. 

But, of course, no one on the project team was actually working on configuring any of those services, or even, for that matter, determining the business rules those services would enforce.  Maybe the salesperson told them it was as easy as flipping a switch?

My friend (especially after looking at the data), preached data quality was a critical business need, but he couldn’t convince them, even despite taking the initiative to present the results of some quick data profiling, standardization, and data matching used to identify duplicate records within and across their primary data sources, which clearly demonstrated the level of poor data quality.

Although this client agreed that they definitely had some serious data issues, they still decided against doing any data cleansing and wanted to just get the data loaded.  Maybe they thought they were loading the data into one of those self-healing databases?

The punchline—this client is a financial services institution with a business need to better identify their most valuable customers.

As my friend lamented at the end of his e-mail, why do clients often later ask why these types of projects fail?

 

Blind Vendor Allegiance

In his recent blog post Blind Vendor Allegiance Trumps Utility, Evan Levy examined this bizarrely common phenomenon of selecting a technology vendor without gathering requirements, reviewing product features, and then determining what tool(s) could best help build solutions for specific business problems—another example of the tool coming before the business need.

Evan was recounting his experiences at a major industry conference on MDM, where people were asking his advice on what MDM vendor to choose, despite admitting “we know we need MDM, but our company hasn’t really decided what MDM is.”

Furthermore, these prospective clients had decided to default their purchasing decision to the technology vendor they already do business with, in other words, “since we’re already a [you can just randomly insert the name of a large technology vendor here] shop, we just thought we’d buy their product—so what do you think of their product?”

“I find this type of question interesting and puzzling,” wrote Evan.  “Why would anyone blindly purchase a product because of the vendor, rather than focusing on needs, priorities, and cost metrics?  Unless a decision has absolutely no risk or cost, I’m not clear how identifying a vendor before identifying the requirements could possibly have a successful outcome.”

 

SaaS-y Data Quality on a Cloudy Business Day?

Emerging industry trends like open source, cloud computing, and software as a service (SaaS) are often touted as less expensive than traditional technology, and I have heard some use this angle to justify buying the tool before identifying the business need.

In his recent blog post Cloud Application versus On Premise, Myths and Realities, Michael Fauscette examined the return on investment (ROI) versus total cost of ownership (TCO) argument quite prevalent in the SaaS versus on premise software debate.

“Buying and implementing software to generate some necessary business value is a business decision, not a technology decision,” Michael concluded.  “The type of technology needed to meet the business requirements comes after defining the business needs.  Each delivery model has advantages and disadvantages financially, technically, and in the context of your business.”

 

So which came first, the Data Quality Tool or the Business Need?

This question is, of course, absurd because, in every rational theory, the business need should always come first.  However, in predictably irrational real-world practice, it remains a classic causality dilemma for data quality related enterprise information initiatives such as data integration, master data management, data warehousing, business intelligence, and data governance.

But sometimes the data quality tool was purchased for an earlier project, and despite what some vendor salespeople may tell you, you don’t always need to buy new technology at the beginning of every new enterprise information initiative. 

Whenever, and before defining your business need, you already have the technology in-house (or you have previously decided, often due to financial constraints, that you will need to build a bespoke solution), you still need to avoid technology bias.

Knowing how the technology works can sometimes cause a framing effect where your business need is defined in terms of the technology’s specific functionality, thereby framing the objective as a technical problem instead of a business problem.

Bottom line—your business problem should always be well-defined before any potential technology solution is evaluated.

 

Related Posts

There are no Magic Beans for Data Quality

Do you believe in Magic (Quadrants)?

Is your data complete and accurate, but useless to your business?

Can Enterprise-Class Solutions Ever Deliver ROI?

Selling the Business Benefits of Data Quality

The Circle of Quality

The Idea of Order in Data

As I explained in my previous post, which used the existentialist philosophy of Jean-Paul Sartre to explain the existence of the data silos that each and every one of an organization’s business units rely on for maintaining their own version of the truth, I am almost as obsessive-compulsive about literature and philosophy as I am about data and data quality.

Therefore, since my previous post was inspired by philosophy, I decided that this blog post should be inspired by literature.

 

Wallace Stevens

Although he consistently received critical praise for his poetry, Wallace Stevens spent most of his life working as a lawyer in the insurance industry.  After winning the Pulitzer Prize for Poetry in 1955, he was offered a faculty position at his alma mater, Harvard University, but declined since it would have required his resignation from his then executive management position. 

Therefore, Wallace Stevens was somewhat unique in the sense he was successful both as an artist and as a business professional, which is one of the many reasons why he remains one of my favorite American poets.

Stevens believed that reality is the by-product of our imagination as we use it to shape the constantly changing world around us.  Since change is the only constant in the universe, reality must be acknowledged as an activity, whereby we are constantly trying to make sense of the world through our re-imagining of it—our endless quest to discover order and meaning amongst the chaos.

 

The Idea of Order in Data

The Idea of Order at Key West by Wallace Stevens

This is an excerpt from The Idea of Order at Key West, one of my favorite Wallace Stevens poems, which provides an example of how our re-imagining of reality shapes the world around us, and allows us to discover order and meaning amongst the chaos.

“People cling to their personal data sets,” explained James Standen of Datamartist in his comment on my previous post.

Even though their business unit’s data silos are “insulated from all those wrong ideas” created and maintained by the data silos of other business units, as Standen wisely points out, all data silos are often considered “not personal enough for the individual.”

“Microsoft Excel lets people create micro-data silos,” Standen continued.  These micro-data silos (i.e., their personal spreadsheets) are “complete (for them), accurate (for them, or at least, they can pretend they are) and constant (in that no matter how much the data in the source system or other people’s spreadsheets change, their spreadsheet will be comfortingly static).  It doesn’t matter what the truth is, as long as they believe their version, and insulate themselves from dissenting views/data sets.”

This insidious pursuit truly becomes a Single Version of the Truth because it represents an individual’s version of the truth. 

The individual is the single artificer of the only world for them—the one that their own private data describes—thereby allowing them to discover their own personal order and meaning amongst the chaos of other, and often conflicting, versions of the truth. 

However, any single version of the truth will only discover a comfortingly static, and therefore false order, as well as an artificial, and therefore misleading meaning, amongst the chaos.

Data is a by-product of our re-imagining of reality.  Data is our abstract description of real-world entities (i.e., “master data”) and the real-world interactions (i.e., “transaction data”) among entities.  Our creation and maintenance of these abstract descriptions of reality shapes our perception of the constantly changing and rapidly evolving business world around us. 

Since change is the only constant, we must acknowledge that The Idea of Order in Data requires a constant activity, whereby we are constantly trying to make sense of the business world through our analysis of the data that describes it, which requires our endless quest to discover the business insight amongst the data chaos.

This quest is bigger than a single individual—or a single business unit.  This quest truly requires an enterprise-wide collaboration, a shared purpose that dissolves the barriers—data silos, politics, and any others—which separate business units and individuals.

The Idea of Order in Data is a quest for a Shared Version of the Truth.

 

Related Posts

Hell is other people’s data

My Own Private Data

Beyond a “Single Version of the Truth”

Finding Data Quality

The Circle of Quality

Is your data complete and accurate, but useless to your business?

Declaration of Data Governance

The Prince of Data Governance

Hell is other people’s data

I just read the excellent blog post Data Migration – and existentialist angst by John Morris, which asks the provocative question what can the philosophy of Jean-Paul Sartre tell us about data migration?

As a blogger almost as obsessive-compulsive about literature and philosophy as I am about data, this post resonated with me.  But perhaps Neil Raden is right when he remarked on Twitter that “anyone who works in Jean-Paul Sartre with data migration should get to spend 90 days with Lindsay Lohan.  Curse of liberal arts education.” (Please Note: Lindsay’s in jail for 90 days).

Part of my liberal arts education (and for awhile I was a literature major with a minor in philosophy) included reading Sartre, not only his existentialist philosophy, but also his literature, including the play No Exit, which is the source of perhaps his most famous quote: “l’enfer, c’est les autres” (“Hell is other people”) that I have paraphrased into the title of this blog post.

 

Being and Nothingness

John Morris used Jean-Paul Sartre’s classic existentialist essay Being and Nothingness, and more specifically, two of its concepts, namely that objects are “en-soi” (“things in themselves”) and people are “pour-soi” (“things for themselves”), to examine the complex relationship that is formed during data analysis between the data (an object) and its analyst (a person).

During data analysis, the analyst is attempting to discover the meaning of data, which is determined by discovering its essential business use.  However, in the vast majority of cases, data has multiple business uses.

This is why, as Morris explains, first of all, we should beware “the naive simplicity of assuming that understanding meaning is easy, that there is one right definition.  The relationship between objects and their essential meanings is far more problematic.”

Therefore, you need not worry, for as Morris points out, “it’s not because you are no good at your job and should seek another trade that you can’t resolve the contradictions.  It’s a problem that has confused some of the greatest minds in history.”

“Secondly,” as Morris continues, we have to acknowledge that “we have the technology we have.  By and large, it limits itself to a single meaning, a single Canonical Model.  What we have to do is get from the messy first problem to the simpler compromise of the second view.  There’s no point hiding away from this as an essential part of our activity.”

 

The complexity of the external world

“Machines are en-soi objects that create en-soi objects,” Morris explains, whereas “people are pour-soi consciousnesses that create meanings and instantiate them in the records they leave behind in the legacy data stores we then have to re-interpret.”

“We then waste time using the wrong tools (e.g., trying to impose an enterprise view onto our business domain experts which is inconsistent with their divergent understandings) only to be surprised and frustrated when our definitions are rejected.”

As I have written about in previous posts, whether it’s an abstract description of real-world entities (i.e., “master data”) or an abstract description of real-world interactions (i.e., “transaction data”) among entities, data is an abstract description of reality.

These abstract descriptions can never be perfected since there is always what I call a digital distance between data and reality.

The inconvenient truth is that reality is not the same thing as the beautifully maintained digital data worlds that exist within our enterprise systems (and, of course, creating and maintaining these abstract descriptions of reality is no easy task).

As Morris thoughtfully concludes, we must acknowledge that “this central problem of the complexity of the external world is against the necessary simplicity of our computer world.”

 

Hell is other people’s data

The inconvenient truth of the complexity of the external world plays a significant role within the existentialist philosophy of an organization’s data silos, which are also the bane of successful enterprise information management. 

Each and every business unit acts as a pour-soi (a thing for themselves), persisting on their reliance on their own data silos, thereby maintaining their own version of the truth—because they truly believe that hell is other people’s data.

DQ-View: The Cassandra Effect

Data Quality (DQ) View is an OCDQ regular segment.  Each DQ-View is a brief video discussion of a data quality key concept.

When you present the business case for your data quality initiative to executive management and other corporate stakeholders, you need to demonstrate that poor data quality is not a myth—it is a real business problem that negatively impacts the quality of decision-critical enterprise information.

But a common mistake when selling the business benefits of data quality is focusing too much on the negative aspects of not investing in data quality.  Although you would be telling the truth, nobody may want to believe things are as bad as you claim.

Therefore, in this new DQ-View segment, I want to discuss avoiding what is sometimes referred to as “the Cassandra Effect.”

 

DQ-View: The Cassandra Effect

 

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: DQ-View on Vimeo

 

Related Posts

Selling the Business Benefits of Data Quality

The Only Thing Necessary for Poor Data Quality

Sneezing Data Quality

Why is data quality important?

Data Quality in Five Verbs

The Five Worst Elevator Pitches for Data Quality

Resistance is NOT Futile

Common Change

Selling the Business Benefits of Data Quality

Mr. ZIP In his book Purple Cow: Transform Your Business by Being Remarkable, Seth Godin used many interesting case studies of effective marketing.  One of them was the United States Postal Services.

“Very few organizations have as timid an audience as the United States Postal Service,” explained Godin.  “Dominated by conservative big customers, the Postal Service has a very hard time innovating.  The big direct marketers are successful because they’ve figured out how to thrive under the current system.  Most individuals are in no hurry to change their mailing habits, either.”

“The majority of new policy initiatives at the Postal Service are either ignored or met with nothing but disdain.  But ZIP+4 was a huge success.  Within a few years, the Postal Service diffused a new idea, causing a change in billions of address records in thousands of databases.  How?”

Doesn’t this daunting challenge sound familiar?  An initiative causing a change in billions of records across multiple databases? 

Sounds an awful lot like a massive data cleansing project, doesn’t it?  If you believe selling the business benefits of data quality, especially on such an epic scale, is easy to do, then stop reading right now—and please publish a blog post about how you did it.

 

Going Postal on the Business Benefits

Getting back to Godin’s case study, how did the United States Postal Service (USPS) sell the business benefits of ZIP+4?

“First, it was a game-changing innovation,” explains Godin.  “ZIP+4 makes it far easier for marketers to target neighborhoods, and much faster and easier to deliver the mail.  ZIP+4 offered both dramatically increased speed in delivery and a significantly lower cost for bulk mailers.  These benefits made it worth the time it took mailers to pay attention.  The cost of ignoring the innovation would be felt immediately on the bottom line.”

Selling the business benefits of data quality (or anything else for that matter) requires defining its return on investment (ROI), which always comes from tangible business impacts, such as mitigated risks, reduced costs, or increased revenue.

Reducing costs was a major selling point for ZIP+4.  Additionally, it mitigated some of the risks associated with direct marketing campaigns, such as the ability to target neighborhoods more accurately, as well as reduce delays in postal delivery times.

However, perhaps the most significant selling point was that “the cost of ignoring the innovation would be felt immediately on the bottom line.”  In other words, the USPS articulated very well that the cost of doing nothing was very tangible.

The second reason ZIP+4 was a huge success, according to Godin, was that the USPS “wisely singled out a few early adopters.  These were individuals in organizations that were technically savvy and were extremely sensitive to both pricing and speed issues.  These early adopters were also in a position to sneeze the benefits to other, less astute, mailers.”

Sneezing the benefits is a reference to another Seth Godin book, Unleashing the Ideavirus, where he explains how the most effective business ideas are the ones that spread.  Godin uses the term ideavirus to describe an idea that spreads, and the term sneezers to describe the people who spread it.

In my blog post Sneezing Data Quality, I explained that it isn’t easy being sneezy, but true sneezers are the innovators and disruptive agents within an organization.  They can be the catalysts for crucial changes in corporate culture.

However, just like with literal sneezing, it can get really annoying if it occurs too frequently. 

To sell the business benefits, you need sneezers that will do such an exhilarating job championing the cause of data quality, that they will help cause the very idea of a sustained data quality program to go viral throughout your entire organization, thereby unleashing the Data Quality Ideavirus.

 

Getting Zippy with it

One of the most common objections to data quality initiatives, and especially data cleansing projects, is that they often produce considerable costs without delivering tangible business impacts and significant ROI.

One of the most common ways to attempt selling the business benefits of data quality is the ROI of removing duplicate records, which although sometimes significant (with high duplicate rates) in the sense of reduced costs on the redundant postal deliveries, it doesn’t exactly convince your business stakeholders and financial decision makers of the importance of data quality.

Therefore, it is perhaps somewhat ironic that the USPS story of why ZIP+4 was such a huge success, actually provides such a compelling case study for selling the business benefits of data quality.

However, we should all be inspired by “Zippy” (aka “Mr. Zip” – the USPS Zip Code mascot shown at the beginning of this post), and start “getting zippy with it” (not an official USPS slogan) when it comes to selling the business benefits of data quality:

  1. Define Data Quality ROI using tangible business impacts, such as mitigated risks, reduced costs, or increased revenue
  2. Articulate the cost of doing nothing (i.e., not investing in data quality) by also using tangible business impacts
  3. Select a good early adopter and recruit sneezers to Champion the Data Quality Cause by communicating your successes

What other ideas can you think of for getting zippy with it when it comes to selling the business benefits of data quality?

 

Related Posts

Promoting Poor Data Quality

Sneezing Data Quality

The Only Thing Necessary for Poor Data Quality

Hyperactive Data Quality (Second Edition)

Data Quality: The Reality Show?

El Festival del IDQ Bloggers (June and July 2010)

IAIDQ Blog Carnival 2010

Welcome to the June and July 2010 issue of El Festival del IDQ Bloggers, which is a blog carnival by the IAIDQ that offers a great opportunity for both information quality and data quality bloggers to get their writing noticed and to connect with other bloggers around the world.

 

Definition Drift

Graham Rhind submitted his July blog post Definition drift, which examines the persistent problems facing attempts to define a consistent terminology within the data quality industry. 

It is essential to the success of a data quality initiative that its key concepts are clearly defined and in a language that everyone can understand.  Therefore, I also recommend that you check out the free online data quality glossary built and maintained by Graham Rhind by following this link: Data Quality Glossary.

 

Lemonade Stand Data Quality

Steve Sarsfield submitted his July blog post Lemonade Stand Data Quality, which explains that data quality projects are a form of capitalism, meaning that you need to sell your customers a refreshing glass and keep them coming back for more.

 

What’s In a Given Name?

Henrik Liliendahl Sørensen submitted his June blog post What’s In a Given Name?, which examines a common challenge facing data quality, master data management, and data matching—namely (pun intended), how to automate the interpretation of the “given name” (aka “first name”) component of a person’s name separately from their “family name” (aka “last name”).

 

Solvency II Standards for Data Quality

Ken O’Connor submitted his July blog post Solvency II Standards for Data Quality, which explains the Solvency II standards are common sense data quality standards, which can enable all organizations, regardless of their industry or region, to achieve complete, appropriate, and accurate data.

 

How Accuracy Has Changed

Scott Schumacher submitted his July blog post How Accuracy Has Changed, which explains that accuracy means being able to make the best use of all the information you have, putting data together where necessary, and keeping it apart where necessary.

 

Uniqueness is in the Eye of the Beholder

Marty Moseley submitted his June blog post Uniqueness is in the Eye of the Beholder, which beholds the challenge of uniqueness and identity matching, where determining if data records should be matched is often a matter of differing perspectives among groups within an organization, where what one group considers unique, another group considers non-unique or a duplicate.

 

Uniqueness in the Eye of the NSTIC

Jeffrey Huth submitted his July blog post Uniqueness in the Eye of the NSTIC, which examines a recently drafted document in the United States regarding a National Strategy for Trusted Identities in Cyberspace (NSTIC).

 

Profound Profiling

Daragh O Brien submitted his July blog post Profound Profiling, which recounts how he has found data profiling cropping up in conversations and presentations he’s been making recently, even where the topic of the day wasn’t “Information Quality” and shares his thoughts on the profound benefits of data profiling for organizations seeking to manage risk and ensure compliance.

 

Wanted: a Data Quality Standard for Open Government Data

Sarah Burnett submitted her July blog post Wanted: a Data Quality Standard for Open Government Data, which calls for the establishment of data quality standards for open government data (i.e., public data sets) since more of it is becoming available.

 

Data Quality Disasters in the Social Media Age

Dylan Jones submitted his July blog post The reality of data quality disasters in a social media age, which examines how bad news sparked by poor data quality travels faster and further than ever before, by using the recent story about the Enbridge Gas billing blunders as a practical lesson for all companies sitting on the data quality fence.

 

Finding Data Quality

Jim Harris (that’s me referring to myself in the third person) submitted my July blog post Finding Data Quality, which explains (with the help of the movie Finding Nemo) that although data quality is often discussed only in its relation to initiatives such as master data management, business intelligence, and data governance, eventually you’ll be finding data quality everywhere.

 

Editor’s Selections

In addition to the official submissions above, I selected the following great data quality blog posts published in June or July 2010:

 

Check out the past issues of El Festival del IDQ Bloggers

El Festival del IDQ Bloggers (May 2010) – edited by Castlebridge Associates

El Festival del IDQ Bloggers (April 2010) – edited by Graham Rhind

El Festival del IDQ Bloggers (March 2010) – edited by Phil Wright

El Festival del IDQ Bloggers (February 2010) – edited by William Sharp

El Festival del IDQ Bloggers (January 2010) – edited by Henrik Liliendahl Sørensen

El Festival del IDQ Bloggers (November 2009) – edited by Daragh O Brien

El Festival del IDQ Bloggers (October 2009) – edited by Vincent McBurney

El Festival del IDQ Bloggers (September 2009) – edited by Daniel Gent

El Festival del IDQ Bloggers (August 2009) – edited by William Sharp

El Festival del IDQ Bloggers (July 2009) – edited by Andrew Brooks

El Festival del IDQ Bloggers (June 2009) – edited by Steve Sarsfield

El Festival del IDQ Bloggers (May 2009) – edited by Daragh O Brien

El Festival del IDQ Bloggers (April 2009) – edited by Jim Harris

A Record Named Duplicate

Although The Rolling Forecasts recently got the band back together for the Data Rock Star World Tour, the tour scheduling (as well as its funding and corporate sponsorship) has encountered some unexpected delays. 

For now, please enjoy the following lyrics from another one of our greatest hits—this one reflects our country music influences.

 

A Record Named Duplicate *

My data quality consultant left our project after month number three,
And he didn’t leave much to my project team and me,
Except this old laptop computer and a bunch of empty bottles of beer.
Now, I don’t blame him ‘cause he run and hid,
But the meanest thing that he ever did,
Was before he left, he went and created a record named “Duplicate.”

Well, he must of thought that it was quite a joke,
But it didn’t get a lot of laughs from any executive management folk,
And it seems I had to fight that duplicate record my whole career through.
Some Business gal would giggle and I’d get red,
And some IT guy would laugh and I’d bust his head,
I tell ya, life ain’t easy with a record named “Duplicate.”

Well, I became a data quality expert pretty damn quick,
My defect prevention skills become pretty damn slick,
And I worked hard everyday to keep my organization’s data nice and clean.
I came to be known for my mean Data Cleansing skills and my keen Data Gazing eye,
And realizing that business insight was where the real data value lies,
As I roamed our data, source to source, I became the Champion of our Data Quality Cause.

But as I collected my fair share of accolades and battle scars, I made a vow to the moon and stars,
That I’d search all the industry conferences, the honky tonks, and the airport bars,
Until I found that data quality consultant who created a record named “Duplicate.”

Well, it was the MIT Information Quality Industry Symposium in mid-July,
And I just hit town and my throat was dry,
So I thought I’d stop by Cheers and have myself a brew.
At that old saloon on Beacon Street,
There at a table, escaping from the Boston summer heat,
Sat the dirty, mangy dog that created a record named “Duplicate.”

Well, I knew that snake was my old data quality consultant,
From the worn-out picture next to his latest Twitter tweet,
And I knew those battle scars on his cheek and his Data Gazing eye.
He was sitting smugly in his chair, looking mighty big and bold,
And as I looked at him sitting there, I could feel my blood running cold.

And I walked right up to him and then I said: “Hi, do you remember me?
On this USB drive in my hand, is some of the dirtiest data you’re ever gonna see,
You think the dirty, mangy likes of you could challenge me at Data Quality?”

Well, he smiled and he took the drive,
And we set up our laptops on the table, side by side.
We data profiled, re-checked the business requirements, and then we data analyzed,
We data cleansed, we standardized, we data matched, and then we re-analyzed.

I tell ya, I’ve fought tougher data cleansing men,
But I really can’t say that I remember when.
I heard him laugh and then I heard him cuss,
And I saw him conquer data defects, then reveal business insight, all without a fuss.

He went to signal that he was done, but then he noticed that I had already won,
And he just sat there looking at me, and then I saw him smile.

Then he said: “This world of Data Quality sure is rough,
And if you’re gonna make it, you gotta be tough,
And I knew I wouldn’t be there to help you along.
So I created that duplicate record and I said goodbye,
I knew you’d have to get tough or watch your data die,
But it’s that duplicate record that helped to make you strong.”

He said: “Now you just fought one hell of a fight,
And I know you hate me, and you got the right,
To tell me off, and I wouldn’t blame you if you do.
But you ought to thank me before you say goodbye,
For your mean Data Cleansing skills and your keen Data Gazing eye,
‘Cause I’m the son-of-a-bitch that helped you realize you have a passion for Data Quality.”

I got all choked up and I realized I should really thank him for what he'd done,
And then he said he could use a beer and I said I’d buy him one,
So we walked over to the Bull & Finch and we had our selves a brew.
And I walked away from the bar that day with a totally different point of view.

I still think about him, every now and then,
I wonder what data he’s cleansing, and wonder what data he’s already cleansed.
But if I ever create a record of my own, I think I’m gonna name it . . .
“Golden” or “Best” or “Survivor”—anything but “Duplicate”—I still hate that damn record!

___________________________________________________________________________________________________________________

* In 1969, Johnny Cash released a very similar song called A Boy Named Sue.

 

Related Posts

Data Rock Stars: The Rolling Forecasts

Data Quality is such a Rush

Data Quality is Sexy

Imagining the Future of Data Quality

The Very Model of a Modern DQ General

DQ-View: Is Data Quality the Sun?

Data Quality (DQ) View is an OCDQ regular segment.  Each DQ-View is a brief video discussion of a data quality key concept.

DataQualityPro

This recent tweet by Dylan Jones of Data Quality Pro succinctly expresses a vitally important truth about the data quality profession.

Although few would debate the necessary requirement of skill, some might doubt the need for passion.  Therefore, in this new DQ-View segment, I want to discuss why data quality initiatives require passionate data professionals.

 

DQ-View: Is Data Quality the Sun?

 

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: DQ-View on Vimeo

 

Related Posts

Data Gazers

Finding Data Quality

Oh, the Data You’ll Show!

Data Rock Stars: The Rolling Forecasts

The Second Law of Data Quality

The General Theory of Data Quality

DQ-Tip: “Start where you are...”

Sneezing Data Quality

Is your data complete and accurate, but useless to your business?

Ensuring that complete and accurate data is being used to make critical daily business decisions is perhaps the primary reason why data quality is so vitally important to the success of your organization. 

However, this effort can sometimes take on a life of its own, where achieving complete and accurate data is allowed to become the raison d'être of your data management strategy—in other words, you start managing data for the sake of managing data.

When this phantom menace clouds your judgment, your data might be complete and accurate—but useless to your business.

Completeness and Accuracy

How much data is necessary to make an effective business decision?  Having complete (i.e., all available) data seems obviously preferable to incomplete data.  However, with data volumes always burgeoning, the unavoidable fact is that sometimes having more data only adds confusion instead of clarity, thereby becoming a distraction instead of helping you make a better decision.

Returning to my original question, how much data is really necessary to make an effective business decision? 

Accuracy, which, thanks to substantial assistance from my readers, was defined in a previous post as both the correctness of a data value within a limited context such as verification by an authoritative reference (i.e., validity) combined with the correctness of a valid data value within an extensive context including other data as well as business processes (i.e., accuracy). 

Although accurate data is obviously preferable to inaccurate data, less than perfect data quality can not be used as an excuse to delay making a critical business decision.  When it comes to the quality of the data being used to make these business decisions, you can’t always get the data you want, but if you try sometimes, you just might find, you get the business insight you need.

Data-driven Solutions for Business Problems

Obviously, there are even more dimensions of data quality beyond completeness and accuracy. 

However, although it’s about more than just improving your data, data quality can be misperceived to be an activity performed just for the sake of the data.  When, in fact, data quality is an enterprise-wide initiative performed for the sake of implementing data-driven solutions for business problems, enabling better business decisions, and delivering optimal business performance.

In order to accomplish these objectives, data has to be not only complete and accurate, as well as whatever other dimensions you wish to add to your complete and accurate definition of data quality, but most important, data has to be useful to the business.

Perhaps the most common definition for data quality is “fitness for the purpose of use.” 

The missing word, which makes this definition both incomplete and inaccurate, puns intended, is “business.”  In other words, data quality is “fitness for the purpose of business use.”  How complete and how accurate (and however else) the data needs to be is determined by its business use—or uses since, in the vast majority of cases, data has multiple business uses.

Data, data everywhere

With silos replicating data as well as new data being created daily, managing all of the data is not only becoming impractical, but because we are too busy with the activity of trying to manage all of it, no one is stopping to evaluate usage or business relevance.

The fifth of the Five New Ideas From 2010 MIT Information Quality Industry Symposium, which is a recent blog post written by Mark Goloboy, was that “60-90% of operational data is valueless.”

“I won’t say worthless,” Goloboy clarified, “since there is some operational necessity to the transactional systems that created it, but valueless from an analytic perspective.  Data only has value, and is only worth passing through to the Data Warehouse if it can be directly used for analysis and reporting.  No news on that front, but it’s been more of the focus since the proliferation of data has started an increasing trend in storage spend.”

In his recent blog post Are You Afraid to Say Goodbye to Your Data?, Dylan Jones discussed the critical importance of designing an archive strategy for data, as opposed to the default position many organizations take, where burgeoning data volumes are allowed to proliferate because, in large part, no one wants to delete (or, at the very least, archive) any of the existing data. 

This often results in the data that the organization truly needs for continued success getting stuck in the long line of data waiting to be managed, and in many cases, behind data for which the organization no longer has any business use (and perhaps never even had the chance to use when the data was actually needed to make critical business decisions).

“When identifying data in scope for a migration,” Dylan advised, “I typically start from the premise that ALL data is out of scope unless someone can justify its existence.  This forces the emphasis back on the business to justify their use of the data.”

Data Memorioso

Funes el memorioso is a short story by Jorge Luis Borges, which describes a young man named Ireneo Funes who, as a result of a horseback riding accident, has lost his ability to forget.  Although Funes has a tremendous memory, he is so lost in the details of everything he knows that he is unable to convert the information into knowledge and unable, as a result, to grow in wisdom.

In Spanish, the word memorioso means “having a vast memory.”  When Data Memorioso is your data management strategy, your organization becomes so lost in all of the data it manages that it is unable to convert data into business insight and unable, as a result, to survive and thrive in today’s highly competitive and rapidly evolving marketplace.

In their great book Made to Stick: Why Some Ideas Survive and Others Die, Chip Heath and Dan Heath explained that “an accurate but useless idea is still useless.  If a message can’t be used to make predictions or decisions, it is without value, no matter how accurate or comprehensive it is.”  I believe that this is also true for your data and your organization’s business uses for it.

Is your data complete and accurate, but useless to your business?

DQ-View: Designated Asker of Stupid Questions

Data Quality (DQ) View is an OCDQ regular segment.  Each DQ-View is a brief video discussion of a data quality key concept.

Effective communication improves everyone’s understanding of data quality, establishes a tangible business context, and helps prioritize critical data issues.  Therefore, as the first video in my new DQ-View segment, I want to discuss a critical role that far too often is missing from data quality initiatives—Designated Asker of Stupid Questions.

 

DQ-View: Designated Asker of Stupid Questions

 

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: DQ-View on Vimeo

 

Related Posts

The Importance of Envelopes

The Point of View Paradox

The Balancing Act of Awareness

Shut Your Mouth

Podcast: Open Your Ears

Hailing Frequencies Open

The Game of Darts – An Allegory

Podcast: Business Technology and Human-Speak

Not So Strange Case of Dr. Technology and Mr. Business

The Acronymicon

Podcast: Stand-Up Data Quality (Second Edition)

Last December, while experimenting with using podcasts and videos to add more variety and more personality to my blogging, I recorded a podcast called Stand-Up Data Quality, in which I discussed using humor to enliven a niche topic such as data quality, and revisited some of the stand-up comedy aspects of some of my favorite written-down blog posts from 2009.

In this brief (approximately 10 minutes) OCDQ Podcast, I share some more of my data quality humor:

You can also download this podcast (MP3 file) by clicking on this link: Stand-Up Data Quality (Second Edition)

 

Related Posts

Wednesday Word: June 23, 2010 – Referential Narcissisity

The Five Worst Elevator Pitches for Data Quality

Data Quality Mad Libs (Part 1)

Data Quality Mad Libs (Part 2)

Podcast: Stand-Up Data Quality (First Edition)

Data Quality: The Reality Show?

Common Change

I recently finished reading the great book Switch: How to Change Things When Change Is Hard by Chip Heath and Dan Heath, which examines why it can be so difficult for us to make lasting changes—both professional changes and personal changes.

“For anything to change,” the Heaths explain, “someone has to start acting differently.  Ultimately, all change efforts boil down to the same mission: Can you get people to start behaving in a new way?”

Their metaphor for change of all kinds is making a Switch, which they explain requires the following three things:

  1. Directing the Rider, which is a metaphor for the rational aspect of our decisions and behavior.
  2. Motivating the Elephant, which is a metaphor for the emotional aspect of our decisions and behavior.
  3. Shaping the Path, which is a metaphor for the situational aspect of our decisions and behavior.

Despite being the most common phenomenon in the universe, change is almost universally resisted, making most of us act as if change is anything but common.  Therefore, in this blog post, I will discuss the Heaths three key concepts using some common terminology: Common Sense, Common Feeling, and Common Place—which, when working together, lead to Common Change.

 

Common Sense

“What looks like resistance is often a lack of clarity,” the Heaths explain.  “Ambiguity is the enemy.  Change begins at the level of individual decisions and behaviors.  To spark movement in a new direction, you need to provide crystal-clear guidance.”

Unfortunately, changes are usually communicated in ways that cause confusion instead of provide clarity.  Many change efforts fail at the outset because of either ambiguous goals or a lack of specific instructions explaining exactly how to get started.

One personal change example would be: Eat Healthier.

Although the goal makes sense, what exactly should I do?  Should I eat smaller amounts of the same food, or eat different food?  Should I start eating two large meals a day while eliminating snacks, or start eating several smaller meals throughout the day?

One professional example would be: Streamline Inefficient Processes.

This goal is even more ambiguous.  Does it mean all of the existing processes are inefficient?  What does streamline really mean?  What exactly should I do?  Should I be spending less time on certain tasks, or eliminating some tasks from my daily schedule?

Ambiguity is the enemy.  For any chance of success to be possible, both the change itself and the plan for making it happen must sound like Common Sense

More specifically, the following two things must be clearly defined and effectively communicated:

  1. Long-term Goal – What exactly is the change that we are going to make—what is our destination?
  2. Short-term Critical Moves – What are the first few things we need to do—how do we begin our journey?

“What is essential,” as the Heaths explain, “is to marry your long-term goal with short-term critical moves.”

“What you don’t need to do is anticipate every turn in the road between today and the destination.  It’s not that plotting the whole journey is undesirable; it’s that it’s impossible.  When you’re at the beginning, don’t obsess about the middle, because the middle is going to look different once you get there.  Just look for a strong beginning and a strong ending and get moving.”

 

Common Feeling

I just emphasized the critical importance of envisioning both the beginning and the end of our journey toward change.

However, what happens in the middle is the change.  So, if common sense can help us understand where we are going and how to get started, what can help keep us going during the really challenging aspects of the middle?

There’s really only one thing that can carry us through the middle—we need to get hooked on a Common Feeling.

Some people—and especially within a professional setting—will balk at discussing the role that feeling (i.e., emotion) plays in our decision making and behavior because it is commonly believed that rational analysis must protect us from irrational emotions.

However, relatively recent advancements in the fields of psychology and neuroscience have proven that good decision making requires the flexibility to know when to rely on rational analysis and when to rely on emotions—and to always consider not only how we’re thinking, but also how we’re feeling.

In their book The Heart of Change: Real-Life Stories of How People Change Their Organizations, John Kotter and Dan Cohen explained that “the core of the matter is always about changing the behavior of people, and behavior change happens mostly by speaking to people’s feelings.  In highly successful change efforts, people find ways to help others see the problems or solutions in ways that influence emotions, not just thought.”

Kotter and Cohen wrote that most people think change happens in this order: ANALYZE—THINK—CHANGE. 

However, from interviewing over 400 people across more than 130 large organizations in the United States, Europe, Australia, and South Africa, they observed that in almost all successful change efforts, the sequence of change is: SEE—FEEL—CHANGE.

“We know there’s a difference between knowing how to act and being motivated to act,” the Heaths explain.  “But when it comes time to change the behavior of other people, our first instinct is to teach them something.”

Making only a rational argument for change without an emotional appeal results in understanding without motivation, and making only an emotional appeal for change without a rational plan results in passion without direction

Therefore, making the case for lasting change requires that you effectively combine common sense with common feeling.

 

Common Place

“That is NOT how we do things around here” is the most common objection to change.  This is the Oath of Change Resistance, which maintains the status quo—the current situation that is so commonplace that it seems like “these people will never change.”

But as the Heaths explain, “what looks like a people problem is often a situation problem.”

Stanford psychologist Lee Ross coined the term fundamental attribution error to describe our tendency to ignore the situational forces that shape other people’s behavior.  The error lies in our inclination to attribute people’s behavior to the way they are rather than to the situation they are in.

When we lament that “these people will never change” we have convinced ourselves that change-resistant behavior equates to a change-resistant personal character and discount the possibility that it simply could be a reflection of the current situation

The great analogy used by the Heaths is water.  When boiling in a pot on the stove, it’s a scalding-hot liquid, but when cooling in a tray in the freezer, it’s an icy-cold solid.  However, declaring either scalding-hot or icy-cold as a fundamental attribute of water and not a situational attribute of water would obviously be absurd—but we do this with people and their behavior all the time.

This doesn’t mean that people’s behavior is always a result of their situation—nor does it excuse inappropriate behavior. 

The fundamental point is that the situation that people are currently in (i.e., their environment) can always be changed, and most important, it can be tweaked in ways that influence their behavior and encourage them to change for the better.

“Tweaking the environment,” the Heaths explain, “is about making the right behaviors a little bit easier and the wrong behaviors a little bit harder.  It’s that simple.”  The status quo is sometimes described as the path of least resistance.  So consider how you could tweak the environment in order to transform the path of least resistance into the path of change.

Therefore, in order to facilitate lasting change, you must create a new Common Place where the change becomes accepted as: “That IS how we do things around here—from now on.”  This is the Oath of Change, which redefines the status quo.

 

Common Change

“When change happens,” the Heaths explain, “it tends to follow a pattern.”  Although it is far easier to recognize than to embrace, in order for any of the changes we need to make to be successful, “we’ve got to stop ignoring that pattern and start embracing it.”

Change begins when our behavior changes.  In order for this to happen, we have to think that the change makes common sense, we have to feel that the change evokes a common feeling, and we have to accept that the change creates a new common place

When all three of these rational, emotional, and situational forces are in complete alignment, then instead of resisting change, we will experience it as Common Change.

 

Related Posts

The Winning Curve

The Balancing Act of Awareness

The Importance of Envelopes

The Point of View Paradox

Persistence

Data Quality and the Cupertino Effect

The Cupertino Effect can occur when you accept the suggestion of a spellchecker program, which was attempting to assist you with a misspelled word (or what it “thinks” is a misspelling because it cannot find an exact match for the word in its dictionary). 

Although the suggestion (or in most cases, a list of possible words is suggested) is indeed spelled correctly, it might not be the word you were trying to spell, and in some cases, by accepting the suggestion, you create a contextually inappropriate result.

It’s called the “Cupertino” effect because with older programs the word “cooperation” was only listed in the spellchecking dictionary in hyphenated form (i.e., “co-operation”), making the spellchecker suggest “Cupertino” (i.e., the California city and home of the worldwide headquarters of Apple, Inc.,  thereby essentially guaranteeing it to be in all spellchecking dictionaries).

By accepting the suggestion of a spellchecker program (and if there’s only one suggested word listed, don’t we always accept it?), a sentence where we intended to write something like:

“Cooperation is vital to our mutual success.”

Becomes instead:

“Cupertino is vital to our mutual success.”

And then confusion ensues (or hilarity—or both).

Beyond being a data quality issue for unstructured data (e.g., documents, e-mail messages, blog posts, etc.), the Cupertino Effect reminded me of the accuracy versus context debate.

 

“Data quality is primarily about context not accuracy...”

This Data Quality (DQ) Tip from last September sparked a nice little debate in the comments section.  The complete DQ-Tip was:

“Data quality is primarily about context not accuracy. 

Accuracy is part of the equation, but only a very small portion.”

Therefore, the key point wasn’t that accuracy isn’t important, but simply to emphasize that context is more important. 

In her fantastic book Executing Data Quality Projects, Danette McGilvray defines accuracy as “a measure of the correctness of the content of the data (which requires an authoritative source of reference to be identified and accessible).”

Returning to the Cupertino Effect for a moment, the spellchecking dictionary provides an identified, accessible, and somewhat authoritative source of reference—and “Cupertino” is correct data content for representing the name of a city in California. 

However, absent a context within which to evaluate accuracy, how can we determine the correctness of the content of the data?

 

The Free-Form Effect

Let’s use a different example.  A common root cause of poor quality for structured data is: free-form text fields.

Regardless of how good the metadata description is written or how well the user interface is designed, if a free-form text field is provided, then you will essentially be allowed to enter whatever you want for the content of the data (i.e., the data value).

For example, a free-form text field is provided for entering the Country associated with your postal address.

Therefore, you could enter data values such as:

Brazil
United States of America
Portugal
United States
República Federativa do Brasil
USA
Canada
Federative Republic of Brazil
Mexico
República Portuguesa
U.S.A.
Portuguese Republic

However, you could also enter data values such as:

Gondor
Gnarnia
Rohan
Citizen of the World
The Land of Oz
The Island of Sodor
Berzerkistan
Lilliput
Brobdingnag
Teletubbyland
Poketopia
Florin

The first list contains real countries, but a lack of standard values introduces needless variations. The second list contains fictional countries, which people like me enter into free-form fields to either prove a point or simply to amuse myself (well okay—both).

The most common solution is to provide a drop-down box of standard values, such as those provided by an identified, accessible, and authoritative source of reference—the ISO 3166 standard country codes.

Problem solved—right?  Maybe—but maybe not. 

Yes, I could now choose BR, US, PT, CA, MX (the ISO 3166 alpha-2 codes for Brazil, United States, Portugal, Canada, Mexico), which are the valid and standardized country code values for the countries from my first list above—and I would not be able to find any of my fictional countries listed in the new drop-down box.

However, I could also choose DO, RE, ME, FI, SO, LA, TT, DE (Dominican Republic, Réunion, Montenegro, Finland, Somalia, Lao People’s Democratic Republic, Trinidad and Tobago, Germany), all of which are valid and standardized country code values, however all of them are also contextually invalid for my postal address.

 

Accuracy: With or Without Context?

Accuracy is only one of the many dimensions of data quality—and you may have a completely different definition for it. 

Paraphrasing Danette McGilvray, accuracy is a measure of the validity of data values, as verified by an authoritative reference. 

My question is what about context?  Or more specifically, should accuracy be defined as a measure of the validity of data values, as verified by an authoritative reference, and within a specific context?

Please note that I am only trying to define the accuracy dimension of data quality, and not data quality

Therefore, please resist the urge to respond with “fitness for the purpose of use” since even if you want to argue that “context” is just another word meaning “use” then next we will have to argue over the meaning of the word “fitness” and before you know it, we will be arguing over the meaning of the word “meaning.”

Please accurately share your thoughts (with or without context) about accuracy and context—by posting a comment below.

The 2010 Data Quality Blogging All-Stars

The 2010 Major League Baseball (MLB) All-Star Game is being held tonight (July 13) at Angel Stadium in Anaheim, California.

For those readers who are not baseball fans, the All-Star Game is an annual exhibition held in mid-July that showcases the players with (for the most part) the best statistical performances during the first half of the MLB season.

Last summer, I began my own annual exhibition of showcasing the bloggers whose posts I have personally most enjoyed reading during the first half of the data quality blogging season. 

Therefore, this post provides links to stellar data quality blog posts that were published between January 1 and June 30 of 2010.  My definition of a “data quality blog post” also includes Data Governance, Master Data Management, and Business Intelligence. 

Please Note: There is no implied ranking in the order that bloggers or blogs are listed, other than that Individual Blog All-Stars are listed first, followed by Vendor Blog All-Stars, and the blog posts are listed in reverse chronological order by publication date.

 

Henrik Liliendahl Sørensen

From Liliendahl on Data Quality:

 

Dylan Jones

From Data Quality Pro:

 

Julian Schwarzenbach

From Data and Process Advantage Blog:

 

Rich Murnane

From Rich Murnane's Blog:

 

Phil Wright

From Data Factotum:

 

Initiate – an IBM Company

From Mastering Data Management:

 

Baseline Consulting

From their three blogs: Inside the Biz with Jill Dyché, Inside IT with Evan Levy, and In the Field with our Experts:

 

DataFlux – a SAS Company

From Community of Experts:

 

Related Posts

Recently Read: May 15, 2010

Recently Read: March 22, 2010

Recently Read: March 6, 2010

Recently Read: January 23, 2010

The 2009 Data Quality Blogging All-Stars

 

Additional Resources

From the IAIDQ, read the 2010 issues of the Blog Carnival for Information/Data Quality: