DQ-Tip: “Data quality is about more than just improving your data...”

Data Quality (DQ) Tips is an OCDQ regular segment.  Each DQ-Tip is a clear and concise data quality pearl of wisdom.

“Data quality is about more than just improving your data.

Ultimately, the goal is improving your organization.”

This DQ-Tip is from Tony Fisher's great book The Data Asset: How Smart Companies Govern Their Data for Business Success.

In the book, Fisher explains that one of the biggest mistakes organizations make is not viewing their data as a corporate asset.  This common misconception often prevents data quality from being rightfully viewed a critical priority. 

Data quality is misperceived to be an activity performed just for the sake of improving data.  When in fact, data quality is an activity performed for the sake of improving business processes.

“Better data leads to better decisions,” explains Fisher, “which ultimately leads to better business.  Therefore, the very success of your organization is highly dependent on the quality of your data.”

 

Related Posts

DQ-Tip: “...Go talk with the people using the data”

DQ-Tip: “Data quality is primarily about context not accuracy...”

DQ-Tip: “Don't pass bad data on to the next person...”

Brevity is the Soul of Social Media

“Why day is day, night night, and time is time,
Were nothing but to waste night, day and time.
Therefore, since brevity is the soul of wit,
I will be brief ...”

Within the wide world of social media, one of the most common features is some form of social networking, microblogging, or short message service that allow users to share brief status updates.  Some social media sites are almost entirely built on only this feature (e.g., Twitter) whereas others (e.g., Facebook, LinkedIn) include it among a list of many other features. 

Either way, these status updates have created a rather pithy platform many people argue is incompatible with meaningful communication, especially of a professional nature.  I must admit this was also my initial opinion of social media.

However, I now believe not only is it the soul of wit, brevity is the soul of social media – and, in fact, a very good soul.

 

Short Attention Span Theater

I doubt attention deficit will still be considered a disorder ten years from now.  We are living increasingly faster-paced lives in an increasingly faster-paced world.  The pervasiveness of the Internet and the rapid proliferation of powerful mobile technology is making our world a smaller and smaller place and our lives a more and more crowded space. 

We have become so accustomed to multi-tasking that the very concept of focusing our attention on only one thing at a time somehow seems inherently wrong to us.  All the world's a stage within this short attention span theater.  And all of us are not merely players, we have been cast in several simultaneous roles.

Time management has always been important, but nowadays it is even more essential.  This is especially true when it comes to social media, which, if we can effectively and efficiently use it, has great personal and professional potential.  Amber Naslund recently provided an excellent blog series on social media time management that I highly recommend.

 

The Power of Pith

I admit I am a long-winded talker or, as a favorite (canceled) television show would say, “conversationally anal-retentive.”  In the past (slightly less now), I was also known for e-mail messages even Leo Tolstoy would declare to be far too long.

Therefore, it may be surprising to learn I am addicted to Twitter.  How could I possibly constrain myself to only 140 characters?  No, I don't use ellipses to extend my thoughts across multiple tweets (although I admit I am often tempted to do so). 

I wholeheartedly agree with Jennifer Blanchard, who explained how Twitter makes you a better writer.  When forced to be concise, you have to focus on exactly what you want to say, using as few words as possible. 

The power of pith means reducing your message to its bare essence.  In order to engage in effective dialogue on the stage of our short attention span theater, this is a required skill we all must master – and not just when we are on Twitter.

For those who argue this simply regresses human communication back to our days of monosyllabic grunting, I invite you to read the excellent recent blog post Is Twitter a Complex Adaptive System? written by Venessa Miemis

Although you should read all of it, the point I need here will be found under Insight #4 toward the end of the post.  Miemis shares a study that reveals using Twitter can not only improve communication, but actually build intelligence. 

The collaborative communication enabled by social media platforms can actually contribute to a growing collective intelligence made up of all of us.  The power of pith is the wisdom of crowds.

 

Blogging with Brevity

Brevity is the soul of all social media and yes, this includes blogging as well.  Some view blogging as social media's last bastion of robust communication.  You can take your time and use all the words you want on your blog, right?  Sure, as long as you have no interest in anyone actually reading your blog.

Some bloggers get cranky with me when I emphasize the Three C’s – meaning your blog posts should be:

  1. Clear – Get to the point and stay on point
  2. Concise – No longer than necessary
  3. Consumable – Formatted to be easily read on a computer screen

Concise is usually the main cranky causing culprit because everyone interprets it to mean “write really short posts.” 

One blogger told me he has “never met a subordinate clause he didn't like,” thereby expressing his fondness for writing compound-complex sentences.  For the non-writers, this means really long (but grammatically correct) sentences oftentimes requiring you to read them three or four times before truly comprehending their full meaning.

Don't get me wrong.  This particular blogger is an incredibly gifted writer known for his absolutely brilliant blog posts.  My only true criticism of his writing style is it truly requires a significant time commitment.

Michelle Russell does a great job explaining how to write with a knife.  No, not literally.  Writing with a knife means writing for yourself, but editing for your readers.  Editing is the hardest part of writing, but also the most important. 

Blogging with brevity doesn't necessarily mean “write really short posts.”  Being concise simply means taking out anything that doesn't need to be included.  For example, you really didn't need to read the additional jokes and Shakespearean references included in the first draft of this post.

 

The Future of Brevity is Bright

Some predict the size limits of message service standards and status updates will be increased.  Others predict new social media platforms will be based on different paradigms.  Either way, innovation will eventually deliver an ability to be more verbose.

However, barring some major scientific breakthrough (or some major breakdown in the space-time continuum), there will still only be 24 hours in a day.  Therefore, no matter what happens, I am certain the future of brevity is bright.

Neither the world nor people in it are likely to slow down.  Our attention spans will remain short.  Our time management skills will remain vigilant.  We will communicate through the power of pith, brevity will remain the soul of both wit and social media, and hopefully, we will all “live long and prosper.”

 

Related Posts

The Mullet Blogging Manifesto

Collablogaunity

Podcast: Your Blog, Your Voice

Collablogaunity

The meteoric rise of the Internet coupled with social media has created an amazing medium that is enabling people who are separated by vast distances and disparate cultures to come together, communicate, and collaborate in ways few would have thought possible just a few decades ago.  Blogging, especially when effectively integrated with social networking, can be one of the most powerful aspects of social media.

The great advantage to blogging as a medium, as opposed to books, newspapers, magazines, and even presentations, is that blogging is not just about broadcasting a message. 

This is not to say that books, newspapers, and magazines aren't useful (they certainly can be) or that presentations lack an interactive component (they certainly should not).  I simply believe that, when done well, blogging better facilities effective communication by starting a conversation, encouraging collaboration, and fostering a true sense of community.

Mashing together the words collaboration, blog, and community, I use the term collablogaunity — which is pronounced “Call a Blog a Unity” — to describe how remarkable blogs do this remarkably well.

 

Conversation

Blogging is a conversation — with your readers. 

I love the sound of my own voice and I talk to myself all the time (even in public).  However, the two-way conversation that blogging provides via comments from my readers greatly improves the quality of my blog content —  because it helps me better appreciate the difference between what I know and what I only think I know.

Without comments, the conversation is only one way.  Engaging readers in dialogue and discussion allows some of your points to be made for you by those who take the time to comment as opposed to you just telling everyone how you see the world.

Blogging isn't about using the Internet as your own personal bullhorn for broadcasting your message.  In her wonderful book The Whuffie Factor, Tara Hunt explains that you really need to:

“Turn the bullhorn around: stop talking, start listening, and create continuous conversations.”

Respond to the comments you receive (but never feed the troll).  You don't have to respond immediately.  Sometimes, the conversation will go more smoothly without your involvement as your readers talk amongst themselves.  Other times, your response will help continue the conversation and encourage participation from others. 

Always demonstrate that feedback is both welcome and appreciated.  Make sure to never talk down to your readers (either in your blog post or your comment responses).  It is perfectly fine to disagree and debate, just don't denigrate.  

In a recent guest post on ProBlogger, Rob McPhillips explained: 

“If instead, you are all the time only seeking praise and approval from everyone, then there is nothing solid, consistent or certain about your blog and so ultimately it will never gather a sizeable core of die hard fans.  Only drive by readers who scan a post and never look back.” 

Collaboration

Blogging is a collaboration — with other bloggers.

While conversation is primarily between you and your readers, collaboration is primarily between you and other bloggers.  Although you may be inclined to view other bloggers as “the competition,” especially those within your own niche, this would be a mistake.  Yes, it is true that blogs are competing with each other for readers.  However, sustainable success is achieved through collaboration and friendly competition with your peers.

Brian Clark has explained in the past and continues to exemplify that strategic collaboration is the secret to 21st century success.  Clark has stated that if he had to reduce his recipe for success to just three ingredients, it would be content, copywriting, and collaboration.  And if he had to give up two of those, then he'd keep collaboration.

In their terrific book Trust Agents, Chris Brogan and Julien Smith explain that although people in most cultures view themselves as the central hero in their life's story, the reality is that you need to build an army because you can't do it all alone.

Collaboration between bloggers is mainly about networking and cross-promotion.  You should network with other bloggers, especially those within your own niche.  This can be accomplished a number of ways including e-mail introductions, Twitter direct messages (if the other blogger is following you), LinkedIn connection requests, or Facebook friend requests.

As with any networking, the most important thing is being genuine.  As Darren Rowse and Chris Garrett explained in their highly recommended ProBlogger book, when you network with other bloggers, keep it real, be specific, keep it brief without being rude, and explain why you are interested in connecting.  They rightfully emphasize the importance of that last point.

As we all know, although content may be king, marketing is queen.  Networking with other bloggers can help you get the word out about your brilliant blog and its penchant for publishing posts that everyone must read.  Adding other bloggers to your blogroll, linking to their posts when applicable to your content, and leaving meaningful comments on their posts are not only recommended best practices of netiquette, they are also just the right thing to do.

Too many bloggers have a selfish networking and marketing strategy.  They only promote their own content and then wonder why nobody reads their blog.  I am fond of referring to all social media as Social Karma.  Focus on helping other bloggers promote their content and they will likely be more willing to return the favor.  However, don't misunderstand this technique to be a pathetic peer pressure tactic in other words, I re-tweeted your blog post, why didn't you re-tweet my blog post?

One last point on collaboration is to set realistic expectations — for others and for yourself.  You should definitely try to help others when you can.  However, you simply can't help everyone.  Don't let people take advantage of your generosity. 

Politely, but firmly, say no when you need to say no.  Also extend the same courtesy to other people when they turn you down (or simply ignore you) when you try to connect with them or when you ask them for their help. 

Mean and selfish people definitely suck.  But let's face it, nobody's perfect — we all have bad days, we all occasionally say and do stupid things, and we all occasionally treat people worse than they deserve to be treated.  So don't be too hard on people when they disappoint you, because tomorrow it will probably be your turn to have a bad day.

 

Community

Blogging is a community service.

If you truly believe and actually practice the principles of both conversation and collaboration, then viewing blogging as a community service comes naturally.  You will truly be more interested in actually listening to what your readers have to say, and less interested in just broadcasting your message.  You will see your words as simply the catalyst that gets the conversation started, and when necessary, helps continue the discussion. 

You will see friends not foes when encountering your blogging peers.  You will help them celebrate their successes and quickly recover from their failures.  You will help others when you can and without worrying about what's in it for you.

As James Chartrand says, you will welcome people to your blog because you view blogging as a festival of people, a community strengthened by people, where everyone can speak up with great care and attention, sharing thoughts and views while openly accepting differing opinions.  Blogging is a community service providing a wealth of experience, thoughts and knowledge being shared by all sorts of participants.

In the closing keynote of this year's BlogWorld conference, Chris Brogan explained (from notes taken by David B. Thomas):

“Make it about them.  Stop looking at this as a cult of me. 

It has to be about your audience.  Turn them into a community. 

The difference between an audience and a community is the way you face the chairs. 

The difference between an audience and a community:

One will fall on its sword for you and the other will watch you fall.”

Collablogaunity

Pronounced: “Call a Blog a Unity”

There are literally millions of blogs on the Internet today.  Your blog (to quote Seth Godin) is “either remarkable or invisible.”

Remarkable blogs primarily do three things:

  1. Start conversations
  2. Encourage collaboration
  3. Foster a true sense of community

Remarkable blogs are collablogaunities.  Is your blog a collablogaunity?

 

Related Posts

The Mullet Blogging Manifesto

Brevity is the Soul of Social Media

Podcast: Your Blog, Your Voice

Beyond a “Single Version of the Truth”

This post is involved in a good-natured contest (i.e., a blog-bout) with two additional bloggers: Henrik Liliendahl Sørensen and Charles Blyth.  Our contest is a Blogging Olympics of sorts, with the United States, Denmark, and England competing for the Gold, Silver, and Bronze medals in an event we are calling “Three Single Versions of a Shared Version of the Truth.” 

Please take the time to read all three posts and then vote for who you think has won the debate (see poll below).  Thanks!

 

The “Point of View” Paradox

In the early 20th century, within his Special Theory of Relativity, Albert Einstein introduced the concept that space and time are interrelated entities forming a single continuum, and therefore the passage of time can be a variable that could change for each individual observer.

One of the many brilliant insights of special relativity was that it could explain why different observers can make validly different observations – it was a scientifically justifiable matter of perspective. 

It was Einstein's apprentice, Obi-Wan Kenobi (to whom Albert explained “Gravity will be with you, always”), who stated:

“You're going to find that many of the truths we cling to depend greatly on our own point of view.”

The Data-Information Continuum

In the early 21st century, within his popular blog post The Data-Information Continuum, Jim Harris introduced the concept that data and information are interrelated entities forming a single continuum, and that speaking of oneself in the third person is the path to the dark side.

I use the Dragnet definition for data – it is “just the facts” collected as an abstract description of the real-world entities that the enterprise does business with (e.g., customers, vendors, suppliers).

Although a common definition for data quality is fitness for the purpose of use, the common challenge is that data has multiple uses – each with its own fitness requirements.  Viewing each intended use as the information that is derived from data, I define information as data in use or data in action.

Quality within the Data-Information Continuum has both objective and subjective dimensions.  Data's quality is objectively measured separate from its many uses, while information's quality is subjectively measured according to its specific use.

 

Objective Data Quality

Data quality standards provide a highest common denominator to be used by all business units throughout the enterprise as an objective data foundation for their operational, tactical, and strategic initiatives. 

In order to lay this foundation, raw data is extracted directly from its sources, profiled, analyzed, transformed, cleansed, documented and monitored by data quality processes designed to provide and maintain universal data sources for the enterprise's information needs. 

At this phase of the architecture, the manipulations of raw data must be limited to objective standards and not be customized for any subjective use.  From this perspective, data is now fit to serve (as at least the basis for) each and every purpose.

 

Subjective Information Quality

Information quality standards (starting from the objective data foundation) are customized to meet the subjective needs of each business unit and initiative.  This approach leverages a consistent enterprise understanding of data while also providing the information necessary for day-to-day operations.

But please understand: customization should not be performed simply for the sake of it.  You must always define your information quality standards by using the enterprise-wide data quality standards as your initial framework. 

Whenever possible, enterprise-wide standards should be enforced without customization.  The key word within the phrase “subjective information quality standards” is standards — as opposed to subjective, which can quite often be misinterpreted as “you can do whatever you want.”  Yes you can – just as long as you have justifiable business reasons for doing so.

This approach to implementing information quality standards has three primary advantages.  First, it reinforces a consistent understanding and usage of data throughout the enterprise.  Second, it requires each business unit and initiative to clearly explain exactly how they are using data differently from the rest of your organization, and more important, justify why.  Finally, all deviations from enterprise-wide data quality standards will be fully documented. 

 

The “One Lie Strategy”

A common objection to separating quality standards into objective data quality and subjective information quality is the enterprise's significant interest in creating what is commonly referred to as a “Single Version of the Truth.”

However, in his excellent book Data Driven: Profiting from Your Most Important Business Asset, Thomas Redman explains:

“A fiendishly attractive concept is...'a single version of the truth'...the logic is compelling...unfortunately, there is no single version of the truth. 

For all important data, there are...too many uses, too many viewpoints, and too much nuance for a single version to have any hope of success. 

This does not imply malfeasance on anyone's part; it is simply a fact of life. 

Getting everyone to work from a single version of the truth may be a noble goal, but it is better to call this the 'one lie strategy' than anything resembling truth.”

Beyond a “Single Version of the Truth”

In the classic 1985 film Mad Max Beyond Thunderdome, the title character arrives in Bartertown, ruled by the evil Auntie Entity, where people living in the post-apocalyptic Australian outback go to trade for food, water, weapons, and supplies.  Auntie Entity forces Mad Max to fight her rival Master Blaster to the death within a gladiator-like arena known as Thunderdome, which is governed by one simple rule:

“Two men enter, one man leaves.”

I have always struggled with the concept of creating a “Single Version of the Truth.”  I imagine all of the key stakeholders from throughout the enterprise arriving in Corporatetown, ruled by the Machiavellian CEO known only as Veritas, where all business units and initiatives must go to request funding, staffing, and continued employment.  Veritas forces all of them to fight their Master Data Management rivals within a gladiator-like arena known as Meetingdome, which is governed by one simple rule:

“Many versions of the truth enter, a Single Version of the Truth leaves.”

For any attempted “version of the truth” to truly be successfully implemented within your organization, it must take into account both the objective and subjective dimensions of quality within the Data-Information Continuum. 

Both aspects of this shared perspective of quality must be incorporated into a “Shared Version of the Truth” that enforces a consistent enterprise understanding of data, but that also provides the information necessary to support day-to-day operations.

The Data-Information Continuum is governed by one simple rule:

“All validly different points of view must be allowed to enter,

In order for an all encompassing Shared Version of the Truth to be achieved.”

 

You are the Judge

This post is involved in a good-natured contest (i.e., a blog-bout) with two additional bloggers: Henrik Liliendahl Sørensen and Charles Blyth.  Our contest is a Blogging Olympics of sorts, with the United States, Denmark, and England competing for the Gold, Silver, and Bronze medals in an event we are calling “Three Single Versions of a Shared Version of the Truth.” 

Please take the time to read all three posts and then vote for who you think has won the debate.  A link to the same poll is provided on all three blogs.  Therefore, wherever you choose to cast your vote, you will be able to view an accurate tally of the current totals. 

The poll will remain open for one week, closing at midnight on November 19 so that the “medal ceremony” can be conducted via Twitter on Friday, November 20.  Additionally, please share your thoughts and perspectives on this debate by posting a comment below.  Your comment may be copied (with full attribution) into the comments section of all of the blogs involved in this debate.

 

Related Posts

Poor Data Quality is a Virus

The General Theory of Data Quality

The Data-Information Continuum

The Once and Future Data Quality Expert

World Quality Day 2009

Wednesday, November 11 is World Quality Day 2009.

World Quality Day was established by the United Nations in 1990 as a focal point for the quality management profession and as a celebration of the contribution that quality makes to the growth and prosperity of nations and organizations.  The goal of World Quality Day is to raise awareness of how quality approaches (including data quality best practices) can have a tangible effect on business success, as well as contribute towards world-wide economic prosperity.

 

IAIDQ

The International Association for Information and Data Quality (IAIDQ) was chartered in January 2004 and is a not-for-profit, vendor-neutral professional association whose purpose is to create a world-wide community of people who desire to reduce the high costs of low quality information and data by applying sound quality management principles to the processes that create, maintain and deliver data and information.

Since 2007 the IAIDQ has celebrated World Quality Day as a springboard for improvement and a celebration of successes.  Please join us to celebrate World Quality Day by participating in our interactive webinar in which the Board of Directors of the IAIDQ will share with you stories and experiences to promote data quality improvements within your organization.

In my recent Data Quality Pro article The Future of Information and Data Quality, I reported on the IAIDQ Ask The Expert Webinar with co-founders Larry English and Tom Redman, two of the industry pioneers for data quality and two of the most well-known data quality experts.

 

Data Quality Expert

As World Quality Day 2009 approaches, my personal reflections are focused on what the title data quality expert has meant in the past, what it means today, and most important, what it will mean in the future.

With over 15 years of professional services and application development experience, I consider myself to be a data quality expert.  However, my experience is paltry by comparison to English, Redman, and other industry luminaries such as David Loshin, to use one additional example from many. 

Experience is popularly believed to be the path that separates knowledge from wisdom, which is usually accepted as another way of defining expertise. 

Oscar Wilde once wrote that “experience is simply the name we give our mistakes.”  I agree.  I have found that the sooner I can recognize my mistakes, the sooner I can learn from the lessons they provide, and hopefully prevent myself from making the same mistakes again. 

The key is early detection.  As I gain experience, I gain an improved ability to more quickly recognize my mistakes and thereby expedite the learning process.

James Joyce wrote that “mistakes are the portals of discovery” and T.S. Eliot wrote that “we must not cease from exploration and the end of all our exploring will be to arrive where we began and to know the place for the first time.”

What I find in the wisdom of these sages is the need to acknowledge the favor our faults do for us.  Therefore, although experience is the path that separates knowledge from wisdom, the true wisdom of experience is the wisdom of failure.

As Jonah Lehrer explained: “Becoming an expert just takes time and practice.  Once you have developed expertise in a particular area, you have made the requisite mistakes.”

But expertise in any discipline is more than simply an accumulation of mistakes and birthdays.  And expertise is not a static state that once achieved, allows you to simply rest on your laurels.

In addition to my real-world experience working on data quality initiatives for my clients, I also read all of the latest books, articles, whitepapers, and blogs, as well as attend as many conferences as possible.

 

The Times They Are a-Changin'

Much of the discussion that I have heard regarding the future of the data quality profession has been focused on the need for the increased maturity of both practitioners and organizations.  Although I do not dispute this need, I am concerned about the apparent lack of attention being paid to how fast the world around us is changing.

Rapid advancements in technology, coupled with the meteoric rise of the Internet and social media (blogs, wikis,  Twitter, Facebook, LinkedIn, etc.) has created an amazing medium that is enabling people separated by vast distances and disparate cultures to come together, communicate, and collaborate in ways few would have thought possible just a few decades ago. 

I don't believe that it is an exaggeration to state that we are now living in an age where the contrast between the recent past and the near future is greater than perhaps it has ever been in human history.  This brave new world has such people and technology in it, that practically every new day brings the possibility of another quantum leap forward.

Although it has been argued by some that the core principles of data quality management are timeless, I must express my doubt.  The daunting challenges of dramatically increasing data volumes and the unrelenting progress of cloud computing, software as a service (SaaS), and mobile computing architectures, would appear to be racing toward a high-speed collision with our time-tested (but time-consuming to implement properly) data quality management principles.

The times they are indeed changing and I believe we must stop using terms like Six Sigma and Kaizen as if they were a shibboleth.  If these or any other disciplines are to remain relevant, then we must honestly assess them in the harsh and unforgiving light of our brave new world that is seemingly changing faster than the speed of light.

Expertise is not static.  Wisdom is not timeless.  The only constant is change.  For the data quality profession to truly mature, our guiding principles must change with the times, or be relegated to a past that is all too quickly becoming distant.

 

Share Your Perspectives

In celebration of World Quality Day, please share your perspectives regarding the past, present, and most important, the future of the data quality profession.  With apologies to T. H. White, I declare this debate to be about the difference between:

The Once and Future Data Quality Expert

Related Posts

Mistake Driven Learning

The Fragility of Knowledge

The Wisdom of Failure

A Portrait of the Data Quality Expert as a Young Idiot

The Nine Circles of Data Quality Hell

 

Additional IAIDQ Links

IAIDQ Ask The Expert Webinar: World Quality Day 2009

IAIDQ Ask The Expert Webinar with Larry English and Tom Redman

INTERVIEW: Larry English - IAIDQ Co-Founder

INTERVIEW: Tom Redman - IAIDQ Co-Founder

IAIDQ Publications Portal

The Mullet Blogging Manifesto

Blogging is more art than science.  My personal blogging style can perhaps best be described as mullet blogging.  No, not the “business in the front, party in the back” haircut that I tried to rock back in the '80s (I couldn't pull it off, had to settle for a “tail” and had to cut that off because it made me look like an idiot – OK, more idiotic than usual).  By mullet blogging I mean:

“Take yourself and your blog seriously, but still have a sense of humor about both.”

As a mullet blogger, I hold the following truths to be self-evident, but I decided to write them down anyway.

 

Blogging is All about You

Not you meaning me, the blogger — you meaning you, the reader.

Blogging should always focus on the reader and provide them assistance with a specific problem, even if that problem is boredom or simply a need for entertainment.  Don't worry about your readers agreeing with you.  They will either thank you for your help or tell you that you're an idiot – either way, you have started a conversation, which should always be your blogging goal.

Brian Clark recently shared something to think about using the following quote from Robert McKee:

“When talented people write badly it’s generally for one of two reasons:

Either they’re blinded by an idea they feel compelled to prove,

Or they’re driven by an emotion they must express.

When talented people write well, it is generally for this reason:

They’re moved by a desire to touch the audience.”

B = U2C3

Blogging = Unique and Useful content that is Clear, Concise, and Consumable.

The conventional blogging wisdom is to be both Unique and Useful.  Although I normally like to defy conventions, I have to agree with the wise ones on these fundamentals.

One of the most important aspects of being unique is writing effective titles.  Most potential readers scan titles to determine whether or not they will click and read more.  There is obviously a delicate balance between effective titles and “baiting,” which will only alienate potential readers. 

If you write a compelling title that makes me click through to an interesting post, then “You Rock!”  However, if you write a “Shock and Awe” title followed by “Aw Shucks” content, then “You Suck!” 

Therefore, your content also has to be unique – your topic, position, voice, or a combination of all three.

One of the most important aspects of useful is “infotainment” – that combination of information and entertainment that, when done well, can turn potential readers into raving fans.  Just don't forget about the previous section – your content has to be informative and entertaining to your readers.

The key to good blogging is to follow the Three C’s – Clear, Concise, Consumable

The attention span of a blog reader is not the same as a reader of books, newspapers (they still exist, right?), magazine articles, or the audience for presentations.  Most people only scan blogs, rarely read a full post and even more rarely leave a comment – regardless of how well the blog post is written. 

Write blog posts that get to the point and stay on point (i.e., clear), are no longer than they need to be (i.e., concise), and are formatted to be easy to read on a computer screen (i.e., consumable).

 

Laugh, Think, Comment

The three things that you want your readers to do.

Although it is not as blatantly formulaic as the title of the previous section, here is another method to my blogging madness:

  1. Open with a joke
  2. Say something thought provoking
  3. End with a call to action

It's as easy as 1-2-3!  In my defense, I didn't say open with a good joke.  But seriously, humor can be a great way to start a conversation and hold your readers' attention for those few precious additional seconds while you are getting to your point.  Obviously, there will be times when the seriousness of your subject would make comedy inappropriate, and if you are not naturally inclined to use humor, then you shouldn't try to force it.

Thought provoking content doesn't have to mean deep thoughts.  There is no need to channel Jean-Paul Sartre, for example.  However, to paraphrase Sartre: “Hell is other people's boring blogs.”

Obviously, comments are not the only type of call to action.  However, blogging is a conversation facilitated by the dialogue and discussion provided via comments from your readers.  Without comments, the conversation is only one way. 

I love the sound of my own voice and I talk to myself all the time (even in public).  However, the two-way conversation provided via comments not only greatly improves the quality of my blog content — much more importantly, it helps me better appreciate the difference between what I know and what I only think I know.

As Darren Rowse and Chris Garrett explained in their highly recommended ProBlogger book: “even the most popular blogs tend to attract only about a 1 percent commenting rate.”  Therefore, don't be too disappointed if you are not getting many comments.  Take that statistic as a challenge to motivate you to write blog posts that your readers simply can not resist commenting on. 

Respond to the comments you do receive.  This continues the two-way conversation and encourages comments from other readers.  Make sure to never talk down to your readers (either in your blog post or your comment responses).  It is perfectly fine to disagree and debate, just don't denigrate. 

Obviously, you should block all spam (leading argument for using comment moderation) and never feed the troll.

 

Stories and Metaphors and Analogies!  Oh, my!

I've a feeling we're not in Kansas anymore.  Especially me, since I live in Iowa.

Darren Rowse recently shared some great tips about why stories are an effective communication tool for your blog, including a list of some of the different types of stories you can tell.

My blog uses a lot of metaphors and analogies (and sometimes just plain silliness) in an attempt to make my posts more interesting.  This is necessary because I write about a niche topic, which although important, is also rather dull.

James Chartrand uses the term Method Blogging as (yes, you guessed it) a metaphor for blogging by comparing it to method acting.  Try experimenting with different styles like an actor experimenting with different types of roles and movie genres. 

Oftentimes, using stories, metaphors, and analogies in my content works very well.  But I admit, sometimes it simply sucks. 

However, I have never been afraid to look like an idiot.  After all, we idiots are important members of society – we make everyone else look smart by comparison.

 

The King, Queen, and Crown Prince of Blogging

Meet the Blogging Royal Family: Content, Marketing, and Context.

Content is King.  The primary reason that people are (or aren't) reading your blog is because of your content.

Marketing is Queen.  “If you blog it, they will read.” Ah, no they won't — this ain't Field of DreamsSome of the best written blogs on the Series of Tubes get hardly any love because they get hardly any marketing.  In addition to providing RSS and e-mail feeds, I use social media (e.g., Twitter, Facebook, LinkedIn) to promote my blog content.

However, too many bloggers have a selfish social media strategy.  Don't use it exclusively for self-promotion.  View social media as Social Karma.  Focus on helping others and you will get much more back than just a blog reader, a LinkedIn connection, a Twitter follower, or a Facebook friend.  In addition to blog promotion (which is important), I use social media to listen, to learn, and to help others when I can.

Larry Brooks recently explained that although content may still be king, at the very least, you must pay homage to the new Crown Prince — Context.  To paraphrase Brooks, context comes from clarity about your blogging goals, juxtaposed against the expectations and tolerances of your readers.  Basically, this above all: to thine own readers be true.

 

Emerson on Blogging

“Nothing can bring you peace but yourself.”

One of my favorite writers is Ralph Waldo Emerson.  The quote that started this section was pure Emerson.  What follows is a slight paraphrasing of one of my all-time favorite passages, which comes from his essay on Self-Reliance:

“What I must do is all that concerns me, not what the people think.  This rule, equally arduous in real and in online life, may serve for the whole distinction between greatness and meanness.  It is the harder because you will always find those who think they know what is your duty better than you know it.  It is easy in the world to live after the world's opinion; it is easy in solitude to live after our own; but the great blogger is one who in the midst of the blogosphere, keeps with perfect sweetness the independence of solitude.”

Bottom line — BE YOURSELF — Let your own personality shine through.  Make people feel like they are having a conversation with a real person and not just someone who is blogging what they think people want to read.

I hope that you found at least some of this manifesto helpful.  I also hope to see more of you around the blogosphere.

I'll be the balding blogger who used to almost have a mullet...

 

Related Posts

Collablogaunity

Brevity is the Soul of Social Media

Podcast: Your Blog, Your Voice

Customer Incognita

Many enterprise information initiatives are launched in order to unravel that riddle, wrapped in a mystery, inside an enigma, that great unknown, also known as...Customer.

Centuries ago, cartographers used the Latin phrase terra incognita (meaning “unknown land”) to mark regions on a map not yet fully explored.  In this century, companies simply can not afford to use the phrase customer incognita to indicate what information about their existing (and prospective) customers they don't currently have or don't properly understand.

 

What is a Customer?

First things first, what exactly is a customer?  Those happy people who give you money?  Those angry people who yell at you on the phone or say really mean things about your company on Twitter and Facebook?  Why do they have to be so mean? 

Mean people suck.  However, companies who don't understand their customers also suck.  And surely you don't want to be one of those companies, do you?  I didn't think so.

Getting back to the question, here are some insights from the Data Quality Pro discussion forum topic What is a customer?:

  • Someone who purchases products or services from you.  The word “someone” is key because it’s not the role of a “customer” that forms the real problem, but the precision of the term “someone” that causes challenges when we try to link other and more specific roles to that “someone.”  These other roles could be contract partner, payer, receiver, user, owner, etc.
  • Customer is a role assigned to a legal entity in a complete and precise picture of the real world.  The role is established when the first purchase is accepted from this real-world entity.  Of course, the main challenge is whether or not the company can establish and maintain a complete and precise picture of the real world.

These working definitions were provided by fellow blogger and data quality expert Henrik Liliendahl Sørensen, who recently posted 360° Business Partner View, which further examines the many different ways a real-world entity can be represented, including when, instead of a customer, the real-world entity represents a citizen, patient, member, etc.

A critical first step for your company is to develop your definition of a customer.  Don't underestimate either the importance or the difficulty of this process.  And don't assume it is simply a matter of semantics.

Some of my consulting clients have indignantly told me: “We don't need to define it, everyone in our company knows exactly what a customer is.”  I usually respond: “I have no doubt that everyone in your company uses the word customer, however I will work for free if everyone defines the word customer in exactly the same way.”  So far, I haven't had to work for free.  

 

How Many Customers Do You Have?

You have done the due diligence and developed your definition of a customer.  Excellent!  Nice work.  Your next challenge is determining how many customers you have.  Hopefully, you are not going to try using any of these techniques:

  • SELECT COUNT(*) AS "We have this many customers" FROM Customers
  • SELECT COUNT(DISTINCT Name) AS "No wait, we really have this many customers" FROM Customers
  • Middle-Square or Blum Blum Shub methods (i.e. random number generation)
  • Magic 8-Ball says: “Ask again later”

One of the most common and challenging data quality problems is the identification of duplicate records, especially redundant representations of the same customer information within and across systems throughout the enterprise.  The need for a solution to this specific problem is one of the primary reasons that companies invest in data quality software and services.

Earlier this year on Data Quality Pro, I published a five part series of articles on identifying duplicate customers, which focused on the methodology for defining your business rules and illustrated some of the common data matching challenges.

Topics covered in the series:

  • Why a symbiosis of technology and methodology is necessary when approaching this challenge
  • How performing a preliminary analysis on a representative sample of real data prepares effective examples for discussion
  • Why using a detailed, interrogative analysis of those examples is imperative for defining your business rules
  • How both false negatives and false positives illustrate the highly subjective nature of this problem
  • How to document your business rules for identifying duplicate customers
  • How to set realistic expectations about application development
  • How to foster a collaboration of the business and technical teams throughout the entire project
  • How to consolidate identified duplicates by creating a “best of breed” representative record

To read the series, please follow these links:

To download the associated presentation (no registration required), please follow this link: OCDQ Downloads

 

Conclusion

“Knowing the characteristics of your customers,” stated Jill Dyché and Evan Levy in the opening chapter of their excellent book, Customer Data Integration: Reaching a Single Version of the Truth, “who they are, where they are, how they interact with your company, and how to support them, can shape every aspect of your company's strategy and operations.  In the information age, there are fewer excuses for ignorance.”

For companies of every size and within every industry, customer incognita is a crippling condition that must be replaced with customer cognizance in order for the company to continue to remain competitive in a rapidly changing marketplace.

Do you know your customers?  If not, then they likely aren't your customers anymore.

The Tell-Tale Data

It is a dark and stormy night in the data center.  The constant humming of hard drives is mimicking the sound of a hard rain falling in torrents, except at occasional intervals, when it is checked by a violent gust of conditioned air sweeping through the seemingly endless aisles of empty cubicles, rattling along desktops, fiercely agitating the flickering glow from flat panel monitors that are struggling against the darkness.

Tonight, amid this foreboding gloom with only my thoughts for company, I race to complete the production implementation of the Dystopian Automated Transactional Analysis (DATA) system.  Nervous, very, very dreadfully nervous I have been, and am, but why will you say that I am mad?  Observe how calmly I can tell you the whole story.

Eighteen months ago, I was ordered by executive management to implement the DATA system.  The vendor's salesperson was an oddly charming fellow named Machiavelli, who had the eye of a vulture — a pale blue eye, with a film over it.  Whenever this eye fell upon me, my blood ran cold. 

Machiavelli assured us all that DATA's seamlessly integrated Magic Beans software would migrate and consolidate all of our organization's information, clairvoyantly detecting and correcting our existing data quality problems, and once DATA was implemented into production, Magic Beans would prevent all future data quality problems from happening.

As soon as a source was absorbed into DATA, Magic Beans automatically did us the favor of freeing up disk space by deleting all traces of the source, somehow even including our off-site archives.  DATA would then become our only system of record, truly our Single Version of the Truth.

It is impossible to say when doubt first entered my brain, but once conceived, it haunted me day and night.  Whenever I thought about it, my blood ran cold — as cold as when that vulture eye was gazing upon me — very gradually, I made up my mind to simply load DATA and rid myself of my doubt forever.

Now this is the point where you will fancy me quite mad.  But madmen know nothing.  You should have seen how wisely I proceeded — with what caution — with what foresight — with what Zen-like tranquility, I went to work! 

I was never happier than I was these past eighteen months while I simply followed the vendor's instructions step by step and loaded DATA!  Would a madman have been so wise as this?  I think not.

Tomorrow morning, DATA goes live.  I can imagine how wonderful that will be.  I will be sitting at my desk, grinning wildly, deliriously happy with a job well done.  DATA will be loaded, data quality will trouble me no more.

It is now four o'clock in the morning, but still it is as dark as midnight.  But as bright as the coming dawn, I can now see three strange men as they gather around my desk. 

Apparently, a shriek had been heard from the business analysts and subject matter experts as soon as they started using DATA.  Suspicions had been aroused, complaints had been lodged, and they (now identifying themselves as auditors) had been called in by a regulatory agency to investigate.

I smile — for what have I to fear?  I welcome these fine gentlemen.  I give them a guided tour of DATA using its remarkably intuitive user interface.  I urge them audit — audit well.  They seemed satisfied.  My manner has convinced them.  I am singularly at ease.  They sit, and while I answer cheerily, they chat away about trivial things.  But before long, I feel myself growing pale and wish them gone.

My head aches and I hear a ringing in my ears, but still they sit and chat.  The ringing becomes more distinct.  I talk more freely, to get rid of the feeling, but it continues and gains volume — until I find that this noise is not within my ears.

No doubt I now grow very pale — but I talk more fluently, and with a heightened voice.  Yet the sound increases — and what can I do?  It is a low, dull, quick sound.  I gasp for breath — and yet the auditors hear it not. 

I talk more quickly — more vehemently — but the noise steadily increases.  I arise, and argue about trifles, in a high key and with violent gesticulations — but the noise steadily increases.  Why will they not be gone?  I pace the floor back and forth, with heavy strides, as if excited to fury by the unrelenting observations of the auditors — but the noise steadily increases.

What could I do?  I raved — I ranted — I raged!  I swung my chair and smashed my computer with it — but the noise rises over all of my attempts to silence it.  It grows louder — louder — louder!  And still the auditors chat pleasantly, and smile.  Is it really possible they can not hear it?  Is it really possible they did not notice me smashing my computer?

They hear! — they suspect! — they know! — they are making a mockery of my horror! — this I thought, and this I think.  But anything is better than this agony!  Anything is more tolerable than this derision!  I can not bear their hypocritical smiles any longer!  I feel that I must scream or die! — and now — again! — the noise!  Louder!  Louder!!  LOUDER!!!

 

“DATA!” I finally shriek.  “DATA has no quality!  NO DATA QUALITY!!!  What have I done?  What — Have — I — Done?!?”

 

With a sudden jolt, I awaken at my desk, with my old friend Edgar shaking me by the shoulders. 

“Hey, wake up!  Executive management wants us in the conference room in five minutes.  Apparently, there is a vendor here today pitching a new system called DATA using software called Magic Beans...” 

“...and the salesperson has this weird eye...”

Days Without A Data Quality Issue

In 1970, the United States Department of Labor created the Occupational Safety and Health Administration (OSHA).  The mission of OSHA is to prevent work-related injuries, illnesses, and deaths.  Based on statistics from 2007, since OSHA's inception, occupational deaths in the United States have been cut by 62% and workplace injuries have declined by 42%.

OSHA regularly conducts inspections to determine if organizations are in compliance with safety standards and assesses financial penalties for violations.  In order to both promote workplace safety and avoid penalties, organizations provide their employees with training on the appropriate precautions and procedures to follow in the event of an accident or an emergency.

Training programs certify new employees in safety protocols and indoctrinate them into the culture of a safety-conscious workplace.  By requiring periodic re-certification, all employees maintain awareness of their personal responsibility in both avoiding workplace accidents and responding appropriately to emergencies.

Although there has been some debate about the effectiveness of the regulations and the enforcement policies, over the years OSHA has unquestionably brought about many necessary changes, especially in the area of industrial work site safety where dangerous machinery and hazardous materials are quite common. 

Obviously, even with well-defined safety standards in place, workplace accidents will still occasionally occur.  However, these standards have helped greatly reduce both the frequency and severity of the accidents.  And most importantly, safety has become a natural part of the organization's daily work routine.

 

A Culture of Data Quality

Similar to indoctrinating employees into the culture of a safety-conscious workplace, more and more organizations are realizing the importance of creating and maintaining the culture of a data quality conscious workplace.  A culture of data quality is essential for effective enterprise information management.

Waiting until a serious data quality issue negatively impacts the organization before starting an enterprise data quality program is analogous to waiting until a serious workplace accident occurs before starting a safety program.

Many data quality issues are caused by a lack of data ownership and an absence of clear guidelines indicating who is responsible for ensuring that data is of sufficient quality to meet the daily business needs of the enterprise.  In order for data quality to be taken seriously within your organization, everyone first needs to know that data quality is an enterprise-wide priority.

Additionally, data quality standards must be well-defined, and everyone must accept their personal responsibility in both preventing data quality issues and responding appropriately to mitigate the associated business risks when issues do occur.

 

Data Quality Assessments

The data equivalent of a safety inspection is a data quality assessment, which provides a much needed reality check for the perceptions and assumptions that the enterprise has about the quality of its data. 

Performing a data quality assessment helps with a wide variety of tasks including: verifying data matches the metadata that describes it, preparing meaningful questions for subject matter experts, understanding how data is being used, quantifying the business impacts of poor quality data, and evaluating the ROI of data quality improvements.

An initial assessment provides a baseline and helps establish data quality standards as well as set realistic goals for improvement.  Subsequent data quality assessments, which should be performed on a regular basis, will track your overall progress.

Although preventing data quality issues is your ultimate goal, don't let the pursuit of perfection undermine your efforts.  Always be mindful of the data quality issues that remain unresolved, but let them serve as motivation.  Learn from your mistakes without focusing on your failures – focus instead on making steady progress toward improving your data quality.

 

Data Governance

The data equivalent of verifying compliance with safety standards is data governance, which establishes policies and procedures to align people throughout the organization.  Enterprise data quality programs require a data governance framework in order successfully deploy data quality as an enterprise-wide initiative. 

By facilitating the collaboration of all business and technical stakeholders, aligning data usage with business metrics, enforcing data ownership, and prioritizing data quality, data governance enables effective enterprise information management.

Obviously, even with well-defined and well-managed data governance policies and procedures in place, data quality issues will still occasionally occur.  However, your goal is to greatly reduce both the frequency and severity of your data quality issues. 

And most importantly, the responsibility for ensuring that data is of sufficient quality to meet your daily business needs, has now become a natural part of your organization's daily work routine.

 

Days Without A Data Quality Issue

Organizations commonly display a sign indicating how long they have gone without a workplace accident.  Proving that I certainly did not miss my calling as a graphic designer, I created this “sign” for Days Without A Data Quality Issue:

Days Without A Data Quality Issue

 

Related Posts

Poor Data Quality is a Virus

DQ-Tip: “Don't pass bad data on to the next person...”

The Only Thing Necessary for Poor Data Quality

Hyperactive Data Quality (Second Edition)

Data Governance and Data Quality

We are the (IBM Information) Champions

Recently, I was honored to be named a 2009-2010 IBM Information Champion

From Vality Technology, through Ascential Software, and eventually with IBM, I have spent most of my career working with the data quality tool that is now known as IBM InfoSphere QualityStage. 

Throughout my time in Research and Development (as a Senior Software Engineer and a Development Engineer) and Professional Services (as a Principal Consultant and a Senior Technical Instructor), I was often asked to wear many hats for QualityStage – and not just because my balding head is distractingly shiny.

True champions are championship teams.  The QualityStage team (past and present) is the most remarkable group of individuals that I have ever had the great privilege to know, let alone the good fortune to work with.  Thank you all very, very much.

 

The IBM Information Champion Program

Previously known as the Data Champion Program, the IBM Information Champion Program honors individuals making outstanding contributions to the Information Management community. 

Technical communities, websites, books, conference speakers, and blogs all contribute to the success of IBM’s Information Management products.  But these activities don’t run themselves. 

Behind the scenes, there are dedicated and loyal individuals who put in their own time to run user groups, manage community websites, speak at conferences, post to forums, and write blogs.  Their time is uncompensated by IBM.

IBM honors the commitment of these individuals with a special designation — Information Champion — as a way of showing their appreciation for the time and energy these exceptional community members expend.

Information Champions are objective experts.  They have no official obligation to IBM. 

They simply share their opinions and years of experience with others in the field, and their work contributes greatly to the overall success of IBM Information Management.

 

We are the Champions

The IBM Information Champion Program has been expanded from the Data Management segment to all segments in Information Management, and now includes IBM Cognos, Enterprise Content Management, and InfoSphere. 

To read more about all of the Information Champions, please follow this link:  Profiles of the IBM Information Champions

 

IBM Website Links

IBM Information Champion Community Space

IBM Information Management User Groups

IBM developerWorks

IBM Information On Demand 2009 Global Conference

IBM Home Page (United States)

 

QualityStage Website Links

IBM Redbook for QualityStage

QualityStage Forum on IBM developerWorks

QualityStage Forum on DSXchange

LinkedIn Group for IBM InfoSphere QualityStage

DataQualityFirst

If you tweet away, I will follow

Today is Friday, which for Twitter users like me, can mean only one thing...

Every Friday, Twitter users recommend other users that you should follow.  FollowFriday has kind of become the Twitter version of peer pressure in other words, I recommended you, why didn't you recommend me?

Among my fellow Twitter addicts, it has come to be viewed either as a beloved tradition of social media community building, or a hated annoyance.  It is almost as deeply polarizing as Pepsi vs. Coke or Soccer vs. Football (by the way, just for the official record, I love FollowFriday and I am firmly in the Pepsi and Football camps and by Football, I mean American Football).

If you are curious how it got started, then check out the Interview with Micah Baldwin, Father of FollowFriday on TwiTip.

In this blog post, I want to provide you with some examples of what I do on FollowFriday, and how I manage to actually follow (or do I?) so many people (586 and counting).

 

FollowFriday Example # 1 – The List

Perhaps the most common example of a FollowFriday tweet is to simply list as many users as you can within the 140 characters:

Twitter FollowFriday 1

 

FollowFriday Example # 2 – The Tweet-Out

An alternative FollowFriday tweet is to send a detailed Tweet-Out (the Twitter version of a Shout-Out) to a single user:

Twitter FollowFriday 2

 

FollowFriday Example # 3 – The Twitter Roll

Yet another alternative FollowFriday tweet is to send a link to a Twitter Roll (the Twitter version of a Blog Roll):

Twitter FollowFriday 3

To add your Twitter link so we can follow you, please click here:  OCDQ Twitter Roll

 

Give a Hoot, Use HootSuite

Most of my FollowFriday tweets are actually scheduled.  In part, I do this because I follow people from all around the world and by the time I finally crawl out of bed on Friday, many of my tweeps have already started their weekend.  And let's face it, the other reason that I schedule my FollowFriday tweets has a lot to do with why obsessive-compulsive is in the name of my blog. 

For scheduling tweets, I like using HootSuite:

HootSuite

Please note that the limitation of 140 characters has necessitated the abbreviation #FF instead of the #followfriday “standard.”

 

The Tweet-rix

The Matrix

Unless you only follow a few people, it is a tremendous challenge to actually follow every user you follow.  To be perfectly honest, I do not follow everyone I follow – no, I wasn't just channeling Yogi Berra (I am a Boston Red Sox fan!).  To borrow an analogy from Phil Simon, trying to watch your entire Twitter stream (i.e. The Tweet-rix) is like being an operator on The Matrix.

My primary Twitter application is TweetDeck:

TweetDeck

Not that I am all about me, but I do pay the most attention to Mentions and Direct Messages.  Next, since I am primarily interested in data quality, I use an embedded search to follow any tweets that use the #dataquality hashtag or mention the phrase “data quality.”  TweetDeck is one of many clients allowing you to create Groups of users to help organize The Tweet-rix. 

To further prove my Sci-Fi geek status, I created a group called TweetDeck Actual, which is an homage to BattleStar Galactica, where saying “This is Galactica Actual” confirms an open communications channel has been established with the Galactica. 

I rotate the users I follow in and out of TweetDeck Actual on a regular basis in order to provide for a narrowly focused variety of trenchant tweets.  (By the way, I learned the word trenchant from a Jill Dyché tweet).

 

The Search for Tweets

You do not need to actually have a Twitter account in order to follow tweets.  There are several search engines designed specifically for Twitter.  And according to recent rumors, tweets will be coming soon to a Google near you.

Here are a just a few ways to search Twitter for data quality content:

 

Conclusion

With apologies to fellow fans of U2 (one of my all-time favorite bands):

If you tweet away, tweet away
I tweet away, tweet away
I will follow
If you tweet away, tweet away
I tweet away, tweet away
I will follow
I will follow

Related Posts

Tweet 2001: A Social Media Odyssey

Poor Quality Data Sucks

Fenway Park 2008 Home Opener

Over the last few months on his Information Management blog, Steve Miller has been writing posts inspired by a great 2008 book that we both highly recommend: The Drunkard's Walk: How Randomness Rules Our Lives by Leonard Mlodinow.

In his most recent post The Demise of the 2009 Boston Red Sox: Super-Crunching Takes a Drunkard's Walk, Miller takes on my beloved Boston Red Sox and the less than glorious conclusion to their 2009 season. 

For those readers who are not baseball fans, the Los Angeles Angels of Anaheim swept the Red Sox out of the playoffs.  I will let Miller's words describe their demise: “Down two to none in the best of five series, the Red Sox took a 6-4 lead into the ninth inning, turning control over to impenetrable closer Jonathan Papelbon, who hadn't allowed a run in 26 postseason innings.  The Angels, within one strike of defeat on three occasions, somehow managed a miracle rally, scoring 3 runs to take the lead 7-6, then holding off the Red Sox in the bottom of the ninth for the victory to complete the shocking sweep.”

 

Baseball and Data Quality

What, you may be asking, does baseball have to do with data quality?  Beyond simply being two of my all-time favorite topics, quite a lot actually.  Baseball data is mostly transaction data describing the statistical events of games played.

Statistical analysis has been a beloved pastime even longer than baseball has been America's Pastime.  Number-crunching is far more than just a quantitative exercise in counting.  The qualitative component of statistics – discerning what the numbers mean, analyzing them to discover predictive patterns and trends – is the very basis of data-driven decision making.

“The Red Sox,” as Miller explained, “are certainly exemplars of the data and analytic team-building methodology” chronicled in Moneyball: The Art of Winning an Unfair Game, the 2003 book by Michael Lewis.  Red Sox General Manager Theo Epstein has always been an advocate of the so-called evidenced-based baseball, or baseball analytics, pioneered by Bill James, the baseball writer, historian, statistician, current Red Sox consultant, and founder of Sabermetrics.

In another book that Miller and I both highly recommend, Super Crunchers, author Ian Ayres explained that “Bill James challenged the notion that baseball experts could judge talent simply by watching a player.  James's simple but powerful thesis was that data-based analysis in baseball was superior to observational expertise.  James's number-crunching approach was particular anathema to scouts.” 

“James was baseball's herald,” continues Ayres, “of data-driven decision making.”

 

The Drunkard's Walk

As Mlodinow explains in the prologue: “The title The Drunkard's Walk comes from a mathematical term describing random motion, such as the paths molecules follow as they fly through space, incessantly bumping, and being bumped by, their sister molecules.  The surprise is that the tools used to understand the drunkard's walk can also be employed to help understand the events of everyday life.”

Later in the book, Mlodinow describes the hidden effects of randomness by discussing how to build a mathematical model for the probability that a baseball player will hit a home run: “The result of any particular at bat depends on the player's ability, of course.  But it also depends on the interplay of many other factors: his health, the wind, the sun or the stadium lights, the quality of the pitches he receives, the game situation, whether he correctly guesses how the pitcher will throw, whether his hand-eye coordination works just perfectly as he takes his swing, whether that brunette he met at the bar kept him up too late, or the chili-cheese dog with garlic fries he had for breakfast soured his stomach.”

“If not for all the unpredictable factors,” continues Mlodinow, “a player would either hit a home run on every at bat or fail to do so.  Instead, for each at bat all you can say is that he has a certain probability of hitting a home run and a certain probability of failing to hit one.  Over the hundreds of at bats he has each year, those random factors usually average out and result in some typical home run production that increases as the player becomes more skillful and then eventually decreases owing to the same process that etches wrinkles in his handsome face.  But sometimes the random factors don't average out.  How often does that happen, and how large is the aberration?”

 

Conclusion

I have heard some (not Mlodinow or anyone else mentioned in this post) argue that data quality is an irrelevant issue.  The basis of their argument is that poor quality data are simply random factors that, in any data set of statistically significant size, will usually average out and therefore have a negligible effect on any data-based decisions. 

However, the random factors don't always average out.  It is important to not only measure exactly how often poor quality data occur, but acknowledge the large aberration poor quality data are, especially in data-driven decision making.

As every citizen of Red Sox Nation is taught from birth, the only acceptable opinion of our American League East Division rivals, the New York Yankees, is encapsulated in the chant heard throughout the baseball season (and not just at Fenway Park):

“Yankees Suck!”

From their inception, the day-to-day business decisions of every organization are based on its data.  This decision-critical information drives the operational, tactical, and strategic initiatives essential to the enterprise's mission to survive and thrive in today's highly competitive and rapidly evolving marketplace. 

It doesn't quite roll off the tongue as easily, but a chant heard throughout these enterprise information initiatives is:

“Poor Quality Data Sucks!”

Books Recommended by Red Sox Nation

Mind Game: How the Boston Red Sox Got Smart, Won a World Series, and Created a New Blueprint for Winning

Feeding the Monster: How Money, Smarts, and Nerve Took a Team to the Top

Theology: How a Boy Wonder Led the Red Sox to the Promised Land

Now I Can Die in Peace: How The Sports Guy Found Salvation Thanks to the World Champion (Twice!) Red Sox

Adventures in Data Profiling (Part 7)

In Part 6 of this seriesYou completed your initial analysis of the Account Number and Tax ID fields. 

Previously during your adventures in data profiling, you have looked at customer name within the context of other fields.  In Part 2, you looked at the associated customer names during drill-down analysis on the Gender Code field while attempting to verify abbreviations as well as assess NULL and numeric values.  In Part 6, you investigated customer names during drill-down analysis for the Account Number and Tax ID fields while assessing the possibility of duplicate records. 

In Part 7 of this award-eligible series, you will complete your initial analysis of this data source with direct investigation of the Customer Name 1 and Customer Name 2 fields.

 

Previously, the data profiling tool provided you with the following statistical summaries for customer names:

Customer Name Summary

As we discussed when we looked at the E-mail Address field (in Part 3) and the Postal Address Line fields (in Part 5), most data profiling tools will provide the capability to analyze fields using formats that are constructed by parsing and classifying the individual values within the field.

Customer Name 1 and Customer Name 2 are additional examples of the necessity of this analysis technique.  Not only are the cardinality of these fields very high, but they also have a very high Distinctness (i.e. the exact same field value rarely occurs on more than one record).

 

Customer Name 1

The data profiling tool has provided you the following drill-down “screen” for Customer Name 1:

Field Formats for Customer Name 1 

Please Note: The differentiation between given and family names has been based on our fictional data profiling tool using probability-driven non-contextual classification of the individual field values. 

For example, Harris, Edward, and James are three of the most common names in the English language, and although they can also be family names, they are more frequently given names.  Therefore, “Harris Edward James” is assigned “Given-Name Given-Name Given-Name” for a field format.  For this particular example, how do we determine the family name?

The top twenty most frequently occurring field formats for Customer Name 1 collectively account for over 80% of the records with an actual value in this field for this data source.  All of these field formats appear to be common potentially valid structures.  Obviously, more than one sample field value would need to be reviewed using more drill-down analysis. 

What conclusions, assumptions, and questions do you have about the Customer Name 1 field?

 

Customer Name 2

The data profiling tool has provided you the following drill-down “screen” for Customer Name 2:

Field Formats for Customer Name 2 

The top ten most frequently occurring field formats for Customer Name 2 collectively account for over 50% of the records with an actual value in this sparsely populated field for this data source.  Some of these field formats show common potentially valid structures.  Again, more than one sample field value would need to be reviewed using more drill-down analysis.

What conclusions, assumptions, and questions do you have about the Customer Name 2 field?

 

The Challenges of Person Names

Not that business names don't have their own challenges, but person names present special challenges.  Many data quality initiatives include the business requirement to parse, identify, verify, and format a “valid” person name.  However, unlike postal addresses where country-specific postal databases exist to support validation, no such “standards” exist for person names.

In his excellent book Viral Data in SOA: An Enterprise Pandemic, Neal A. Fishman explains that “a person's name is a concept that is both ubiquitous and subject to regional variations.  For example, the cultural aspects of an individual's name can vary.  In lieu of last name, some cultures specify a clan name.  Others specify a paternal name followed by a maternal name, or a maternal name followed by a paternal name; other cultures use a tribal name, and so on.  Variances can be numerous.”

“In addition,” continues Fishman, “a name can be used in multiple contexts, which might affect what parts should or could be communicated.  An organization reporting an employee's tax contributions might report the name by using the family name and just the first letter (or initial) of the first name (in that sequence).  The same organization mailing a solicitation might choose to use just a title and a family name.”

However, it is not a simple task to identify what part of a person's name is the family name or the first given name (as some of the above data profiling sample field values illustrate).  Again, regional, cultural, and linguistic variations can greatly complicate what at first may appear to be a straightforward business request (e.g. formatting a person name for a mailing label).

As Fishman cautions, “many regions have cultural name profiles bearing distinguishing features for words, sequences, word frequencies, abbreviations, titles, prefixes, suffixes, spelling variants, gender associations, and indications of life events.”

If you know of any useful resources for dealing with the challenges of person names, then please share them by posting a comment below.  Additionally, please share your thoughts and experiences regarding the challenges (as well as useful resources) associated with business names.

 

What other analysis do you think should be performed for customer names?

 

In Part 8 of this series:  We will conclude the adventures in data profiling with a summary of the lessons learned.

 

Related Posts

Adventures in Data Profiling (Part 1)

Adventures in Data Profiling (Part 2)

Adventures in Data Profiling (Part 3)

Adventures in Data Profiling (Part 4)

Adventures in Data Profiling (Part 5)

Adventures in Data Profiling (Part 6)

Getting Your Data Freq On

Commendable Comments (Part 3)

In a July 2008 blog post on Men with Pens (one of the Top 10 Blogs for Writers 2009), James Chartrand explained:

“Comment sections are communities strengthened by people.”

“Building a blog community creates a festival of people” where everyone can, as Chartrand explained, “speak up with great care and attention, sharing thoughts and views while openly accepting differing opinions.”

I agree with James (and not just because of his cool first name) – my goal for this blog is to foster an environment in which a diversity of viewpoints is freely shared without bias.  Everyone is invited to get involved in the discussion and have an opportunity to hear what others have to offer.  This blog's comment section has become a community strengthened by your contributions.

This is the third entry in my ongoing series celebrating my heroes – my readers.

 

Commendable Comments

On The Fragility of Knowledge, Andy Lunn commented:

“In my field of Software Development, you simply cannot rest and rely on what you know.  The technology you master today will almost certainly evolve over time and this can catch you out.  There's no point being an expert in something no one wants any more!  This is not always the case, but don't forget to come up for air and look around for what's changing.

I've lost count of the number of organizations I've seen who have stuck with a technology that was fresh 15 years ago and a huge stagnant pot of data, who are now scrambling to come up to speed with what their customers expect.  Throwing endless piles of cash at the problem, hoping to catch up.

What am I getting at?  The secret I've learned is to adapt.  This doesn't mean jump on every new fad immediately, but be aware of it.  Follow what's trending, where the collective thinking is heading and most importantly, what do your customers want?

I just wish more organizations would think like this and realize that the systems they create, the data they hold, and the customers they have are in a constant state of flux.  They are all projects that need care and attention.  All subject to change, there's no getting away from it, but small, well planned changes are a lot less painful, trust me.”

On DQ-Tip: “Data quality is primarily about context not accuracy...”, Stephen Simmonds commented:

“I have to agree with Rick about data quality being in the eye of the beholder – and with Henrik on the several dimensions of quality.

A theme I often return to is 'what does the business want/expect from data?' – and when you hear them talk about quality, it's not just an issue of accuracy.  The business stakeholder cares – more than many seem to notice – about a number of other issues that are squarely BI concerns:

– Timeliness ('WHEN I want it')
– Format ('how I want to SEE it') – visualization, delivery channels
– Usability ('how I want to then make USE of it') – being able to extract information from a report (say) for other purposes
– Relevance ('I want HIGHLIGHTED the information that is meaningful to me')

And so on.  Yes, accuracy is important, and it messes up your effectiveness when delivering inaccurate information.  But that's not the only thing a business stakeholder can raise when discussing issues of quality.  A report can be rejected as poor quality if it doesn't adequately meet business needs in a far more general sense.  That is the constant challenge for a BI professional.”

On Mistake Driven Learning, Ken O'Connor commented:

“There is a Chinese proverb that says:

'Tell me and I'll forget; Show me and I may remember; Involve me and I'll understand.'

I have found the above to be very true, especially when seeking to brief a large team on a new policy or process.  Interaction with the audience generates involvement and a better understanding.

The challenge facing books, whitepapers, blog posts etc. is that they usually 'Tell us,' they often 'Show us,' but they seldom 'Involve us.'

Hence, we struggle to remember, and struggle even more to understand.  We learn best by 'doing' and by making mistakes.”

You Are Awesome

Thank you very much for your comments.  For me, the best part of blogging is the dialogue and discussion provided by interactions with my readers.  Since there have been so many commendable comments, please don't be offended if your commendable comment hasn't been featured yet.  Please keep on commenting and stay tuned for future entries in the series.

By the way, even if you have never posted a comment on my blog, you are still awesome — feel free to tell everyone I said so.

 

Related Posts

Commendable Comments (Part 1)

Commendable Comments (Part 2)

DQ-Tip: “...Go talk with the people using the data”

Data Quality (DQ) Tips is an OCDQ regular segment.  Each DQ-Tip is a clear and concise data quality pearl of wisdom.

“In order for your data quality initiative to be successful, you must:

Walk away from the computer and go talk with the people using the data.”

This DQ-Tip came from the TDWI World Conference Chicago 2009 presentation Modern Data Quality Techniques in Action by Gian Di Loreto from Loreto Services and Technologies.

As I blogged about in Data Gazers (borrowing that excellent phrase from Arkady Maydanchik), within cubicles randomly dispersed throughout the sprawling office space of companies large and small, there exist countless unsung heroes of data quality initiatives.  Although their job titles might be labeling them as a Business Analyst, Programmer Analyst, Account Specialist or Application Developer, their true vocation is a far more noble calling.  They are Data Gazers.

A most bizarre phenomenon (that I have witnessed too many times) is that as a data quality initiative “progresses” it tends to get further and further away from the people who use the data on a daily basis.

Please follow the excellent advice of Gian and Arkady — go talk with your users. 

Trust me — everyone on your data quality initiative will be very happy that you did.

 

Related Posts

DQ-Tip: “Data quality is primarily about context not accuracy...”

DQ-Tip: “Don't pass bad data on to the next person...”