What Data Quality Technology Wants

This is a screen capture of the results of last month’s unscientific data quality poll where it was noted that viewpoints about the role of data quality technology (i.e., what data quality technology wants) are generally split between two opposing perspectives:

  1. Technology enables a data quality process, but doesn’t obviate the need for people (e.g., data stewards) to remain actively involved and be held accountable for maintaining the quality of data.
  2. Technology automates a data quality process, and a well-designed and properly implemented technical solution obviates the need for people to be actively involved after its implementation.

 

Commendable Comments

Henrik Liliendahl Sørensen voted for enable, but commented he likes to say enables by automating the time consuming parts, an excellent point which he further elaborated on in two of his recent blog posts: Automation and Technology and Maturity.

Garnie Bolling commented that he believes people will always be part of the process, especially since data quality has so many dimensions and trends, and although automated systems can deal with what he called fundamental data characteristics, an automated system can not change with trends or the ongoing evolution of data.

Frank Harland commented that automation can and has to take over the tedious bits of work (e.g., he wouldn't want to type in all those queries that can be automated by data profiling tools), but to get data right, we have to get processes right, get data architecture right, get culture and KPI’s right, and get a lot of the “right” people to do all the hard work that has to be done.

Chris Jackson commented that what an organization really needs is quality data processes not data quality processes, and once the focus is on treating the data properly rather than catching and remediating poor data, you can have a meaningful debate about the relative importance of well-trained and motivated staff vs. systems that encourage good data behavior vs. replacing fallible people with standard automated process steps.

Alexa Wackernagel commented that when it comes to discussions about data migration and data quality with clients, she often gets the requirement—or better to call it the dream—for automated processes, but the reality is that data handling needs easy accessible technology to enable data quality.

Thanks to everyone who voted and special thanks to everyone who commented.  As always, your feedback is greatly appreciated.

 

What Data Quality Technology Wants: Enable and Automate


“Data Quality Powers—Activate!”


“I’m sorry, Defect.  I’m afraid I can’t allow that.”

I have to admit that my poll question was flawed (as my friend HAL would say, “It can only be attributable to human error”).

Posing the question in an either/or context made it difficult for the important role of automation within data quality processes to garner many votes.  I agree with the comments above that the role of data quality technology is to both enable and automate.

As the Wonder Twins demonstrate, data quality technology enables Zan (i.e., technical people), Jayna (i.e., business people), and  Gleek (i.e., data space monkeys, er I mean, data people) to activate one of their most important powers—collaboration.

In addition to the examples described in the comments above, data quality technology automates proactive defect prevention by providing real-time services, which greatly minimize poor data quality at the multiple points of origin within the data ecosystem, because although it is impossible to prevent every problem before it happens, the more control enforced where data originates, the better the overall enterprise data quality will be—or as my friend HAL would say:

“Putting data quality technology to its fullest possible use is all any corporate entity can ever hope to do.”

Related Posts

What Does Data Quality Technology Want?

DQ-Tip: “Data quality tools do not solve data quality problems...”

Which came first, the Data Quality Tool or the Business Need?

Data Quality Industry: Problem Solvers or Enablers?

Data Quality Magic

The Tooth Fairy of Data Quality

Data Quality is not a Magic Trick

Do you believe in Magic (Quadrants)?

Pirates of the Computer: The Curse of the Poor Data Quality

DQ-BE: Single Version of the Time

Data Quality By Example (DQ-BE) is an OCDQ regular segment that provides examples of data quality key concepts.

Photo via Flickr by: Leo Reynolds

Like truth, beauty, and singing ability, data quality is in the eyes of the beholder.

Data’s quality is determined by evaluating its fitness for the purpose of use.  However, in the vast majority of cases, data has multiple uses, and data of sufficient quality for one use may not be of sufficient quality for other uses.

Therefore, to be more accurate, data quality is in the eyes of the user.

The perspective of the user provides a relative context for data quality.  Many argue an absolute context for data quality exists, one which is independent of the often conflicting perspectives of different users.

This absolute context is often referred to as a “Single Version of the Truth.”

As one example of the challenges inherent in this data quality key concept, let’s consider if there is a “Single Version of the Time.”

 

Single Version of the Time

I am writing this blog post at 10:00 AM.  I am using time in a relative context, meaning that from my perspective it is 10 o’clock in the morning.  I live in the Central Standard time zone (CST) of the United States. 

My friend in Europe would say that I am writing this blog post at 5:00 PM.  He is also using time in a relative context, meaning that from his perspective it is 5 o’clock in the afternoon.  My friend lives in the Central European time zone (CET).

We could argue that an absolute time exists, as defined by Coordinated Universal Time (UTC).  Local times around the world can be expressed as a relative time using positive or negative offsets from UTC.  For example, my relative time is UTC-6 and my friend’s relative time is UTC+1.  Alternatively, we could use absolute time and say that I am writing this blog post at 16:00 UTC.

Although using an absolute time is an absolute necessity if, for example, my friend and I wanted to schedule a time to have a telephone (or Skype) discussion, it would be confusing to use UTC when referring to events relative to our local time zone.

In other words, the relative context of the user’s perspective is valid and an absolute context independent of the perspectives of different users is also valid—especially whenever a shared perspective is necessary in order to facilitate dialogue and discussion.

Therefore, instead of calling UTC a Single Version of the Time, we could call it a Shared Version of the Time and when it comes to the data quality concept of a Single Version of the Truth, perhaps it’s time we started calling it a Shared Version of the Truth.

 

Related Posts

Single Version of the Truth

The Quest for the Golden Copy

Beyond a “Single Version of the Truth”

The Idea of Order in Data

DQ-BE: Data Quality Airlines

DQ-Tip: “There is no such thing as data accuracy...”

Data Quality and the Cupertino Effect

DQ-Tip: “Data quality is primarily about context not accuracy...”

#FollowFriday Spotlight: @DataQualityPro

FollowFriday Spotlight is an OCDQ regular segment highlighting someone you should follow—and not just Fridays on Twitter.

Links for Data Quality Pro and Dylan Jones:

Data Quality Pro, founded and maintained by Dylan Jones, is a free and independent community resource dedicated to helping data quality professionals take their career or business to the next level.  Data Quality Pro is your free expert resource providing data quality articles, webinars, forums and tutorials from the world’s leading experts, every day.

With the mission to create the most beneficial data quality resource that is freely available to members around the world, the goal of Data Quality Pro is “winning-by-sharing” and they believe that by contributing a small amount of their experience, skill or time to support other members then truly great things can be achieved.

Membership is 100% free and provides a broad range of additional content for professionals of all backgrounds and skill levels.

Check out the Best of Data Quality Pro, which includes the following great blog posts written by Dylan Jones in 2010:

 

Related Posts

#FollowFriday and Re-Tweet-Worthiness

#FollowFriday and The Three Tweets

Dilbert, Data Quality, Rabbits, and #FollowFriday

Twitter, Meaningful Conversations, and #FollowFriday

The Fellowship of #FollowFriday

Social Karma (Part 7) – Twitter

DQ-View: New Data Resolutions

Data Quality (DQ) View is an OCDQ regular segment.  Each DQ-View is a brief video discussion of a data quality key concept.

 

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: DQ-View on Vimeo

The graphics shown in the video were created under a Creative Commons Attribution License using: Wordle

 

New Data Resolutions

If one of your New Year’s Resolutions was not to listen to my rambling, here is the video’s (spoiler alert!) thrilling conclusion:

Now, of course, in order for this to truly count as one of your New Data Resolutions for 2011, you will have to provide your own WHY and WHAT that is specific to your organization’s enterprise data initiative.

After all, it’s not like I can eat healthier or exercise more often for you in 2011.  Happy New Year!

 

Related Posts

“Some is not a number and soon is not a time”

Common Change

Video: Declaration of Data Governance

DQ View: Achieving Data Quality Happiness

Don’t Do Less Bad; Do Better Good

Data Quality is not a Magic Trick

DQ-View: Designated Asker of Stupid Questions

DQ-View: The Cassandra Effect

DQ-View: From Data to Decision

Video: Oh, the Data You’ll Show!

The Best Data Quality Blog Posts of 2010

This year-end review provides summaries of and links to The Best Data Quality Blog Posts of 2010.  Please note the following:

  • For simplicity, “Data Quality” also includes Data Governance, Master Data Management, and Business Intelligence
  • Intentionally excluded from consideration were my best blog posts of the year — not counting that shameless plug :-)
  • The Data Roundtable was also excluded since I already published a series about its best 2010 blog posts (see links below)
  • Selection was based on a pseudo-scientific, quasi-statistical, and proprietary algorithm (i.e., I just picked the ones I liked)
  • Ordering is based on a pseudo-scientific, quasi-statistical, and proprietary algorithm (i.e., no particular order whatsoever)

 

The Best Data Quality Blog Posts of 2010

  • Data Quality is a DATA issue by Graham Rhind – Expounds on the common discussion about whether data quality is a business issue or a technical issue by explaining that although it can sometimes be either or both, it’s always a data issue.
  • Bad word?: Data Owner by Henrik Liliendahl Sørensen – Examines how the common data quality terms “data owner” and “data ownership” are used, whether they are truly useful, and generated an excellent comment discussion about ownership.
  • Predictably Poor MetaData Quality by Beth Breidenbach – Examines whether data quality and metadata quality issues stem from the same root source—human behavior, which is also the solution to these issues since technology doesn’t cause or solve these challenges, but rather, it’s a tool that exacerbates or aids human behavior in either direction.
  • WANTED: Data Quality Change Agents by Dylan Jones – Explains the key traits required of all data quality change agents, including a positive attitude, a willingness to ask questions, innovation advocating, and persuasive evangelism.
  • Profound Profiling by Daragh O Brien – Discusses the profound business benefits of data profiling for organizations seeking to manage risk and ensure compliance, including the sage data and information quality advice: “Profile early, profile often.”
  • The Importance of Scope in Data Quality Efforts by Jill Dyché – Illustrates five levels of delivery that can help you quickly establish the boundaries of your initial data quality project, which will enable you to implement an incremental approach for your sustained data quality program that will build momentum to larger success over time.
  • The Myth about a Myth by Henrik Liliendahl Sørensen – Debunks the myth that data quality (and a lot of other things) is all about technology — and it’s certainly no myth that this blog post generated a lengthy discussion in the comments section.
  • Definition drift by Graham Rhind – Examines the persistent problems facing attempts to define a consistent terminology within the data quality industry for concepts such as validity versus accuracy, and currency versus timeliness.
  • Data Quality: A Philosophical Approach to Truth by Beth Breidenbach – Examines how the background, history, and perceptions we bring to a situation, any situation, will impact what we perceive as “truth” in that moment, and we don’t have to agree with another’s point of view, but we should at least make an attempt to understand the logic behind it.
  • What Are Master Data? by Marty Moseley of IBM Initiate – Defines the differences between reference data and master data, providing examples of each, and, not surprisingly, this blog post also sparked an excellent discussion within its comments.
  • Data Governance Remains Immature by Rob Karel – Examines the results of several data governance surveys and explains how there is a growing recognition that data governance is not — and should never have been — about the data.
  • The Future – Agile Data-Driven Enterprises by John Schmidt on Informatica Perspectives – Concludes a seven-part series about data as an asset, which examines how successful organizations manage their data as a strategic asset, ensuring that relevant, trusted data can be delivered quickly when, where and how needed to support the changing needs of the business.
  • Data as an Asset by David Pratt – The one where a new guy in the data blogosphere (his blog launched in November 2010) explains treating data as an asset is all about actively doing things to improve both the quality and usefulness of the data.

 

PLEASE NOTE: No offense is intended to any of the great 2010 data quality blog posts not listed above.  However, if you feel that I have made a glaring omission, then please feel free to post a comment below and add it to the list.  Thanks!

I hope that everyone had a great 2010 and I look forward to seeing all of you around the data quality blogosphere in 2011.

 

Related Posts

The 2010 Data Quality Blogging All-Stars

Recently Read: May 15, 2010

Recently Read: March 22, 2010

Recently Read: March 6, 2010

Recently Read: January 23, 2010

 

Additional Resources

From the IAIDQ, read the 2010 issues of the Blog Carnival for Information/Data Quality:

From the Data Roundtable, read the 2010 quarterly review blog series:

I’m Gonna Data Profile (500 Records)

While researching my blog post (to be published on December 31) about the best data quality blog posts of the year, I re-read the great post Profound Profiling by Daragh O Brien, which recounted how he found data profiling cropping up in conversations and presentations he’d made this year, even where the topic of the day wasn’t “Information Quality” and shared his thoughts on the profound business benefits of data profiling for organizations seeking to manage risk and ensure compliance.

And I noticed that I had actually commented on this blog post . . . with song lyrics . . .

 

I’m Gonna Data Profile (500 Records) *

When I wake up, well I know I’m gonna be,
I’m gonna be the one who profiles early and often for you
When I go out, yeah I know I’m gonna be
I’m gonna be the one who goes along with data
If I get drunk, well I know I’m gonna be
I’m gonna be the one who gets drunk on managing risk for you
And if I haver up, yeah I know I’m gonna be
I’m gonna be the one who’s havering about how: “It’s the Information, Stupid!”

But I would profile 500 records
And I would profile 500 more
Just to be the one who profiles a thousand records
To deliver the profound business benefits of data profiling to your door

da da da da – ta ta ta ta
da da da da – ta ta ta ta – data!
da da da da – ta ta ta ta
da da da da – ta ta ta ta – data profiling!

When I’m working, yes I know I’m gonna be
I’m gonna be the one who’s working hard to ensure compliance for you
And when the money, comes in for the work I do
I’ll pass almost every penny on to improving data for you
When I come home (When I come home), well I know I’m gonna be
I’m gonna be the one who comes back home with data quality
And if I grow-old, (When I grow-old) well I know I’m gonna be
I’m gonna be the one who’s growing old with information quality

But I would profile 500 records
And I would profile 500 more
Just to be the one who profiles a thousand records
To deliver the profound business benefits of data profiling to your door

da da da da – ta ta ta ta
da da da da – ta ta ta ta – data!
da da da da – ta ta ta ta
da da da da – ta ta ta ta – data profiling!

When I’m lonely, well I know I’m gonna be
I’m gonna be the one who’s lonely without data profiling to do
And when I’m dreaming, well I know I’m gonna dream
I’m gonna dream about the time when I’m data profiling for you
When I go out (When I go out), well I know I’m gonna be
I’m gonna be the one who goes along with data
And when I come home (When I come home), yes I know I’m gonna be
I’m gonna be the one who comes back home with data quality
I’m gonna be the one who’s coming home with information quality

But I would profile 500 records
And I would profile 500 more
Just to be the one who profiles a thousand records
To deliver the profound business benefits of data profiling to your door

da da da da – ta ta ta ta
da da da da – ta ta ta ta – data!
da da da da – ta ta ta ta
da da da da – ta ta ta ta – data profiling!

___________________________________________________________________________________________________________________

* Based on the 1988 song I’m Gonna Be (500 Miles) by The Proclaimers.

 

Data Quality Music (DQ-Songs)

Over the Data Governance Rainbow

A Record Named Duplicate

New Time Human Business

People

You Can’t Always Get the Data You Want

A spoonful of sugar helps the number of data defects go down

Data Quality is such a Rush

I’m Bringing DQ Sexy Back

Imagining the Future of Data Quality

The Very Model of a Modern DQ General

What Does Data Quality Technology Want?

During a recent Radiolab podcast, Kevin Kelly, author of the book What Technology Wants, used the analogy of how a flower leans toward sunlight because it “wants” the sunlight, to describe what the interweaving web of evolving technical innovations (what he refers to as the super-organism of technology) is leaning toward—in other words, what technology wants.

The other Radiolab guest was Steven Johnson, author of the book Where Good Ideas Come From, who somewhat dispelled the traditional notion of the eureka effect by explaining that the evolution of ideas, like all evolution, stumbles its way toward the next good idea, which inevitably leads to a significant breakthrough, such as what happens with innovations in technology.

Listening to this thought-provoking podcast made me ponder the question: What does data quality technology want?

In a previous post, I used the term OOBE-DQ to refer to the out-of-box-experience (OOBE) provided by data quality (DQ) tools, which usually becomes a debate between “ease of use” and “powerful functionality” after you ignore the Magic Beans sales pitch that guarantees you the data quality tool is both remarkably easy to use and incredibly powerful.

The data quality market continues to evolve away from esoteric technical tools and stumble its way toward the next good idea, which is business-empowering suites providing robust functionality with increasingly role-based user interfaces, which are tailored to the specific needs of different users.  Of course, many vendors would love to claim sole responsibility for what they would call significant innovations in data quality technology, instead of what are simply by-products of an evolving market.

The deployment of data quality functionality within and across organizations also continues to evolve, as data cleansing activities are being complemented by real-time defect prevention services used to greatly minimize poor data quality at the multiple points of origin within the enterprise data ecosystem.

However, viewpoints about the role of data quality technology generally remain split between two opposing perspectives:

  1. Technology enables a data quality process, but doesn’t obviate the need for people (e.g., data stewards) to remain actively involved and be held accountable for maintaining the quality of data.
  2. Technology automates a data quality process, and a well-designed and properly implemented technical solution obviates the need for people to be actively involved after its implementation.

Do you think that continuing advancements and innovations in data quality technology will obviate the need for people to be actively involved in data quality processes?  In the future, will we have high quality data because our technology essentially wants it and therefore leans our organizations toward high quality data?  Let’s conduct another unscientific data quality poll:

 

Additionally, please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

 

Related Posts

DQ-Tip: “Data quality tools do not solve data quality problems...”

Which came first, the Data Quality Tool or the Business Need?

Data Quality Industry: Problem Solvers or Enablers?

Data Quality Magic

The Tooth Fairy of Data Quality

Data Quality is not a Magic Trick

Do you believe in Magic (Quadrants)?

Pirates of the Computer: The Curse of the Poor Data Quality

#FollowFriday and Re-Tweet-Worthiness

There is perhaps no better example of the peer pressure aspects of social networking than FollowFriday—the day when Twitter users recommend other users that you should follow (i.e., “I recommended you, why didn’t you recommend me?”).

However, every day of the week re-tweeting (the forwarding of another user’s Twitter status update, aka tweet) is performed.  Many bloggers (such as myself) use Twitter to promote their content by tweeting links to their new blog posts, and therefore, most re-tweets are attempts—made by the other members of the blogger’s collablogaunity—to help share meaningful content.

But I would be willing to wager that a considerable amount of re-tweeting is based on the act of reciprocity—and not based on evaluating the Re-Tweet-Worthiness of the shared content.  In other words, I believe that many people (myself included) sometimes don’t read what they re-tweet, but simply share content from a previously determined re-tweet-worthy source, or a source that they hope will reciprocate in the future (i.e., “I re-tweeted your blog post, why didn’t you re-tweet my blog post?”).

 

How do YOU determine Re-Tweet-Worthiness?

 

#FollowFriday Recommendations

By no means a comprehensive list, and listed in no particular order whatsoever, here are some great tweeps, and especially for truly re-tweet-worthy tweets about Data Quality, Data Governance, Master Data Management, and Business Intelligence:

 

PLEASE NOTE: No offense is intended to any of my tweeps not listed above.  However, if you feel that I have made a glaring omission of an obviously Twitterific Tweep, then please feel free to post a comment below and add them to the list.  Thanks!

I hope that everyone has a great FollowFriday and an even greater weekend.  See you all around the Twittersphere.

 

Related Posts

Data Quality and #FollowFriday the 13th

Twitter, Data Governance, and a #ButteredCat #FollowFriday

#FollowFriday and The Three Tweets

Dilbert, Data Quality, Rabbits, and #FollowFriday

Twitter, Meaningful Conversations, and #FollowFriday

The Fellowship of #FollowFriday

Demystifying Social Media

Social Karma

The Challenging Gift of Social Media

The Wisdom of the Social Media Crowd

The Good Data

Photo via Flickr (Creative Commons License) by: Philip Fibiger

When I was growing up, my family had a cabinet filled with “the good dishes” that were reserved for use on special occasions, i.e., the plates, bowls, and cups that would only be used for holiday dinners like Thanksgiving or Christmas.  The rest of the year, we used “the everyday dishes” that were a random collection of various sets of dishes collected over the years. 

Meals using the everyday dishes would seldom have matching plates, bowls, and cups, and if these dishes had a pattern on them once, it was mostly, if not completely, worn down by repeated use and constant washing.  Whenever we actually got to use the good dishes, it made the meal seem more special, more fancy, perhaps it even made the food seem like it tasted a little bit better.

Some organizations have a database filled with “the good data” that are reserved for special occasions.  In other words, the data prepared for specific business uses such as regulatory compliance and reporting.  Meanwhile, the rest of the time, and perhaps in support of daily operations, the organization uses “the everyday data” that is often a random collection of various data sets.

Business activities using the everyday data would seldom use a single source, but instead mash-up data from several sources, perhaps even storing the results in a spreadsheet or a private database—otherwise known by the more nefarious term: data silo.

Most of the time, when organizations discuss their enterprise data management strategy, they focus on building and maintaining the good data.  However, unlike the good dishes, the organization tries to force everyone to use the good data even for everyday business activities, and essentially force the organization to throw away the everyday data—to eliminate all those data silos.

But there is a time and a place for both the good dishes and the everyday dishes, as well as paper plates and plastic cups.  And yes, even eating with your hands has a time and a place, too.

The same is true for data.  Yes, you should build and maintain the good data to be used to support as many business activities as possible.  And yes, you should minimize the special occasions where customized data and/or data silos are truly necessary.

But you should also accept that since there is so much data available to the enterprise, and so many business uses for it, that forcing everyone to use only the good data might be preventing your organization from maximizing the full potential of its data.

 

Related Posts

To Our Data Perfectionists

DQ-View: From Data to Decision

The Data-Decision Symphony

Is your data complete and accurate, but useless to your business?

You Can’t Always Get the Data You Want

A Confederacy of Data Defects

One of my favorite novels is A Confederacy of Dunces by John Kennedy Toole.  The novel tells the tragicomic tale of Ignatius J. Reilly, described in the foreword by Walker Percy as a “slob extraordinary, a mad Oliver Hardy, a fat Don Quixote, and a perverse Thomas Aquinas rolled into one.”

The novel was written in the 1960s before the age of computer filing systems, so one of the jobs Ignatius has is working as a paper filing clerk in a clothing factory.  His employer is initially impressed with his job performance, since the disorderly mess of invoices and other paperwork slowly begin to disappear, resulting in the orderly appearance of a well organized and efficiently managed office space.

However, Ignatius is fired after he reveals the secret to his filing system—instead of filing the paperwork away into the appropriate file cabinets, he has simply been throwing all of the paperwork into the trash.

This scene reminds me of how data quality issues (aka data defects) are often perceived.  Many organizations acknowledge the importance of data quality, but don’t believe that data defects occur very often because the data made available to end users in dashboards and reports often passes through many processes that cleanse or otherwise sanitize the data before it reaches them.

ETL processes that extract source data for a data warehouse load will often perform basic data quality checks.  However, a fairly standard practice for “resolving” a data defect is to substitute a NULL value (e.g., a date stored in a text field in a source system that can not be converted into a valid date value is usually loaded into the target relational database with a NULL value).

When postal address validation software generates a valid mailing address, it often does so by removing what it considers to be “extraneous” information from the input address fields, which may include valid data accidentally entered into the wrong field, or that was lacking its own input field (e.g., e-mail address in an input address field deleted from the output valid mailing address).

And some reporting processes intentionally filter out “bad records” or eliminate “outlier values.”  This happens most frequently when preparing highly summarized reports, especially those intended for executive management.

These are just a few examples of common practices that can create the orderly appearance of a high quality data environment, but that conceal a confederacy of data defects about which the organization may remain blissfully (and dangerously) ignorant.

Do you suspect that your organization may be concealing A Confederacy of Data Defects?

Can Data Quality avoid the Dustbin of History?

After reading two blog posts about the 2011 predictions for data management by Steve Sarsfield and Henrik Liliendahl Sørensen, I was pondering writing a 2011 prediction post of my own—and then I read this recent Dilbert comic strip.

What if Dogbert is right and the only things that matter are social networks, games, and phones?  What implications does this have for the data management industry, and more specifically, the data quality profession?  How can data quality practitioners avoid being cast into the Dustbin of History in 2011 and beyond?

Perhaps we need to create a social network for data?  Let’s call it DataTweetBook.  Although we would be allowed to follow any data with a public profile, data would have to approve our friend requests—you know, in order to respect data’s privacy.

(Quick Side Bar Question: Do you think that your organization’s data would accept your friend request—or block you?)

Next, we would partner with Zynga and create DataVille and Data Quality Wars, which would be online games exclusive to the DataTweetBook platform.  These games would include fun challenges, like “consolidate duplicates in your contact database” and “design a user interface that prevents data quality issues from happening.”  You and your data can even ask other people and data in your social network for help with completing tasks, such as “ask postal reference data to validate your mailing addresses.”

Of course, we would then need to create iPhone and Android apps for DataTweetBook, DataVille, and Data Quality Wars, so that everyone can access the new social network and games on their mobile phones.  And eventually, we would start a bidding war between Apple and Google over the exclusive rights to make an integrated mobile device, either iDataPad or DataGoogler.

So that’s my 2011 prognostication for the data quality industry—it’s going be all about social networks, games, and phones.

 

Related Posts

Dilbert, Data Quality, Rabbits, and #FollowFriday

Comic Relief: Dilbert on Project Management

Comic Relief: Dilbert to the Rescue

‘Tis the Season for Data Quality

‘Tis the season for getting holiday greeting cards, and not only from family and friends, since many companies also like to mail seasons greetings to their customers, employees, and business partners.

I do appreciate the sentiment, but I mostly just check the envelopes for data quality issues with the name and/or postal address.

I have never made it through an entire holiday season without receiving at least one incorrectly addressed greeting card, and this year was no exception.  In the above image, I have highlighted that I apparently live in the town of Ankely, Pennsylvania.

I actually live in the town of Ankeny, Iowa.

The United States postal abbreviations for Pennsylvania and Iowa are PA and IA, respectively.  Additionally, the town name is only off by one character (L instead of N in the fifth position of a six character string).  Therefore, the data matching algorithms provided by most data quality tools would consider these relatively minor discrepancies to be highly probable matches.

And although Pennsylvania and Iowa are approximately 900 miles away from each other, since my street address and ZIP code (both intentionally blurred out in the image) were correct, the post office was able to successfully deliver the greeting card to me.

However, the really funny thing is that this greeting card was sent to me by a . . . (wait for it) . . . data quality tool vendor!

So apparently ‘tis the season for data quality . . . data quality issues, that is :-)

 

‘Tis the Season for Sharing Data Quality Stories

Have you encountered any seasonal data quality issues?  If so, please share your story by posting a comment below.

Does your organization have a Calumet Culture?

In my previous post, I once again blogged about how the key to success for most, if not all, organizational initiatives is the willingness of people all across the enterprise to embrace collaboration.

However, what happens when an organization’s corporate culture doesn’t foster an environment of collaboration?

Sometimes as a result of rapid business growth, an organization trades effectiveness for efficiency, prioritizes short-term tactics over long-term strategy, and even encourages “friendly” competition amongst its relatively autonomous business units.

However, when the need for a true enterprise-wide initiative such as data governance becomes (perhaps painfully) obvious, the organization decides to bring representatives from all of its different “tribes” together to discuss the complexities of the business, data, technical, and (most important) people related issues that would shape the realities of a truly collaborative environment.

“Calumet Culture” is the term I like using (and not just because of my affinity for alliteration) to describe the disingenuous way that I have occasionally witnessed these organizational stakeholder gathering “ceremonies” carried out.

Calumet was the Norman word used by Norman-French Canadian settlers to describe the “peace pipes” they witnessed the people of the First Nations (referred to as Native Americans in the United States) using at ceremonies marking a treaty between previously combative factions.

Simply gathering everyone together around the camp fire (or the conference room table) is an empty gesture, similar in many ways to non-Native Americans mimicking a “peace pipe ceremony” and using one of their words (Calumet) to describe what was in fact a deeply spiritual object used to convey true significance to the event.

When collaboration is discussed at strategic planning meetings with great pomp and circumstance, but after the meetings end, the organization returns to its non-collaborative status quo, then little, if any, true collaboration should be expected to happen.

Does your organization have a Calumet Culture?

In other words, does your organization have a corporate culture that talks the talk of collaboration, but doesn’t walk the walk?

If so, how have you attempted to overcome this common barrier to success?

Data Governance and the Social Enterprise

In his blog post Socializing Software, Michael Fauscette explained that in order “to create a next generation enterprise, businesses need to take two concepts from the social web and apply them across all business functions: community and content.”

“Traditional enterprise software,” according to Fauscette, “was built on the concept of managing through rigid business processes and controlled workflow.  With process at the center of the design, people-based collaboration was not possible.”

Peter Sondergaard, the global head of research at Gartner, explained at a recent conference that “the rigid business processes which dominate enterprise organizational architectures today are well suited for routine, predictable business activities.  But they are poorly suited to support people who’s jobs require discovery, interpretation, negotiation and complex decision-making.”

“Social computing,” according to Sondergaard, “not Facebook, or Twitter, or LinkedIn, but the technologies and principals behind them will be implemented across and between all organizations, and it will unleash yet to be realized productivity growth.”

Since the importance of collaboration is one of my favorite topics, I like Fauscette’s emphasis on people-based collaboration and Sondergaard’s emphasis on the limitations of process-based collaboration.  The key to success for most, if not all, organizational initiatives is the willingness of people all across the enterprise to embrace collaboration.

Successful organizations view collaboration not just as a guiding principle, but as a call to action in their daily business practices.

As Sondergaard points out, the technologies and principals behind social computing are the key to enabling what many analysts have begun referring to as the social enterprise.  Collaboration is the key to business success.  This essential collaboration has to be based on people, and not on rigid business processes since business activities and business priorities are constantly changing.

 

Data Governance and the Social Enterprise

Often the root cause of poor data quality can be traced to a lack of a shared understanding of the roles and responsibilities involved in how the organization is using its data to support its business activities.  The primary focus of data governance is the strategic alignment of people throughout the organization through the definition, implementation, and enforcement of the policies that govern the interactions between people, business processes, data, and technology.

A data quality program within a data governance framework is a cross-functional, enterprise-wide initiative requiring people to be accountable for its data, business process, and technology aspects.  However, policy enforcement and accountability are often confused with traditional notions of command and control, which is the antithesis of the social enterprise that instead requires an emphasis on communication, cooperation, and people-based collaboration.

Data governance policies for data quality illustrate the intersection of business, data, and technical knowledge, which is spread throughout the enterprise, transcending any artificial boundaries imposed by an organizational chart or rigid business processes, where different departments or different business functions appear as if they were independent of the rest of the organization.

Data governance reveals how interconnected and interdependent the organization is, and why people-driven social enterprises are more likely to survive and thrive in today’s highly competitive and rapidly evolving marketplace.

Social enterprises rely on the strength of their people asset to successfully manage their data, which is a strategic corporate asset because high quality data serves as a solid foundation for an organization’s success, empowering people, enabled by technology, to optimize business processes for superior business performance.

 

Related Posts

Podcast: Data Governance is Mission Possible

Trust is not a checklist

The Business versus IT—Tear down this wall!

The Road of Collaboration

Shared Responsibility

Enterprise Ubuntu

Data Transcendentalism

Social Karma