Podcast: Your Blog, Your Voice

In this OCDQ Podcast, I discuss the importance of blogging in your own voice. 

The best way to produce unique content is to let your blogging style reflect your personality.  Make your readers feel like they are having a conversation with a real person – not just someone who is blogging what they think people want to read.

Your Blog, Your Voice

 

You can also download this podcast (MP3 file) by clicking on this link: Your Blog, Your Voice

 

Related Posts

The Mullet Blogging Manifesto

Collablogaunity

Brevity is the Soul of Social Media

Live-Tweeting: Data Governance

The term “live-tweeting” describes using Twitter to provide near real-time reporting from an event.  I live-tweet from the sessions I attend at industry conferences as well as interesting webinars.

Recently, I live-tweeted Successful Data Stewardship Through Data Governance, which was a data governance webinar featuring Marty Moseley of Initiate Systems and Jill Dyché of Baseline Consulting.

Instead of writing a blog post summarizing the webinar, I thought I would list my tweets with brief commentary.  My goal is to provide an example of this particular use of Twitter so you can decide its value for yourself.

 

As the webinar begins, Marty Moseley and Jill Dyché provide some initial thoughts on data governance:

Live-Tweets 1

 

Jill Dyché provides a great list of data governance myths and facts:

Live-Tweets 2

 

Jill Dyché provides some data stewardship insights:

Live-Tweets 3

 

As the webinar ends, Marty Moseley and Jill Dyché provide some closing thoughts about data governance and data quality:

Live-Tweets 4

 

Please Share Your Thoughts

If you attended the webinar, then you know additional material was presented.  Did my tweets do the webinar justice?  Did you follow along on Twitter during the webinar?  If you did not attend the webinar, then are these tweets helpful?

What are your thoughts in general regarding the pros and cons of live-tweeting? 

 

Related Posts

The following three blog posts are conference reports based largely on my live-tweets from the events:

Enterprise Data World 2009

TDWI World Conference Chicago 2009

DataFlux IDEAS 2009

Data Quality is Sexy

 

Jim Harris 017

I am sick and tired of hearing people talk about how data quality (DQ) is not sexy.

I was talking with my friend J.T. the other day and he told me I simply needed to remind people data quality has always been sexy.  Sometimes, people just have a tendency to forget. 

J.T. told me:

“You know what you gotta do J.H.?  You gotta bring DQ Sexy back.”

True dat, J.T.

 

I'm Bringing DQ Sexy Back

 

Jim Harris 001

 

I’m bringing DQ Sexy back

All you naysayers, watch how I attack

I think your data’s special, why does your quality lack?

Grant me some access, and I’ll pick up the slack

 

 

Jim Harris 008

 

Dirty data – you see the problems everywhere

Let me be your data cleanser, and baby, I'll be there

We'll whip the Business Process if it misbehaves

But just remember – trying to be perfect – it's not the way

 

 

Jim Harris 005 

I’m bringing DQ Sexy back

Them non-team players don’t know how to act

Let our collaboration get us back on track

Working together, we'll make the right impact

 

 

Jim Harris 010

 

Look at that data – it's your 'prise asset 
Treat it well, and all your business needs will be met

Understanding it will really make you smile 
To get started, you really need to profile

There's no need for you to be afraid – come on 
Go ahead – get your data freak on

 

Jim Harris 014 

I’m bringing DQ Sexy back

Any non-believers left?  Don't make me give you a smack

If you have data, you'd better watch out for what it lacks

'Cause quality is what it needs – and that’s a fact

 

 

Data Quality is Sexy

Jim Harris 015

That’s right. 

Data Quality is Sexy. 

Always has been. 

Always will be.

True dat, J.H.

Fo real!

 

Adventures in Data Profiling (Part 8)

Understanding your data is essential to using it effectively and improving its quality – and to achieve these goals, there is simply no substitute for data analysis.  This post is the conclusion of a vendor-neutral series on the methodology of data profiling.

Data profiling can help you perform essential analysis such as:

  • Provide a reality check for the perceptions and assumptions you may have about the quality of your data
  • Verify your data matches the metadata that describes it
  • Identify different representations for the absence of data (i.e., NULL and other missing values)
  • Identify potential default values
  • Identify potential invalid values
  • Check data formats for inconsistencies
  • Prepare meaningful questions to ask subject matter experts

Data profiling can also help you with many of the other aspects of domain, structural and relational integrity, as well as determining functional dependencies, identifying redundant storage, and other important data architecture considerations.

 

Adventures in Data Profiling

This series was carefully designed as guided adventures in data profiling in order to provide the necessary framework for demonstrating and discussing the common functionality of data profiling tools and the basic methodology behind using one to perform preliminary data analysis.

In order to narrow the scope of the series, the scenario used was a customer data source for a new data quality initiative had been made available to an external consultant with no prior knowledge of the data or its expected characteristics.  Additionally, business requirements had not yet been documented, and subject matter experts were not currently available.

This series did not attempt to cover every possible feature of a data profiling tool or even every possible use of the features that were covered.  Both the data profiling tool and data used throughout the series were fictional.  The “screen shots” were customized to illustrate concepts and were not modeled after any particular data profiling tool.

This post summarizes the lessons learned throughout the series, and is organized under three primary topics:

  1. Counts and Percentages
  2. Values and Formats
  3. Drill-down Analysis

 

Counts and Percentages

One of the most basic features of a data profiling tool is the ability to provide counts and percentages for each field that summarize its content characteristics:

 Data Profiling Summary

  • NULL – count of the number of records with a NULL value 
  • Missing – count of the number of records with a missing value (i.e., non-NULL absence of data, e.g., character spaces) 
  • Actual – count of the number of records with an actual value (i.e., non-NULL and non-Missing) 
  • Completeness – percentage calculated as Actual divided by the total number of records 
  • Cardinality – count of the number of distinct actual values 
  • Uniqueness – percentage calculated as Cardinality divided by the total number of records 
  • Distinctness – percentage calculated as Cardinality divided by Actual

Completeness and uniqueness are particularly useful in evaluating potential key fields and especially a single primary key, which should be both 100% complete and 100% unique.  In Part 2, Customer ID provided an excellent example.

Distinctness can be useful in evaluating the potential for duplicate records.  In Part 6, Account Number and Tax ID were used as examples.  Both fields were less than 100% distinct (i.e., some distinct actual values occurred on more than one record).  The implied business meaning of these fields made this an indication of possible duplication.

Data profiling tools generate other summary statistics including: minimum/maximum values, minimum/maximum field sizes, and the number of data types (based on analyzing the values, not the metadata).  Throughout the series, several examples were provided, especially in Part 3 during the analysis of Birth Date, Telephone Number and E-mail Address.

 

Values and Formats

In addition to counts, percentages, and other summary statistics, a data profiling tool generates frequency distributions for the unique values and formats found within the fields of your data source.

A frequency distribution of unique values is useful for:

  • Fields with an extremely low cardinality, indicating potential default values (e.g., Country Code in Part 4)
  • Fields with a relatively low cardinality (e.g., Gender Code in Part 2)
  • Fields with a relatively small number of known valid values (e.g., State Abbreviation in Part 4)

A frequency distribution of unique formats is useful for:

  • Fields expected to contain a single data type and/or length (e.g., Customer ID in Part 2)
  • Fields with a relatively limited number of known valid formats (e.g., Birth Date in Part 3)
  • Fields with free-form values and a high cardinality (e.g., Customer Name 1 and Customer Name 2 in Part 7)

Cardinality can play a major role in deciding whether you want to be shown values or formats since it is much easier to review all of the values when there are not very many of them.  Alternatively, the review of high cardinality fields can also be limited to the most frequently occurring values, as we saw throughout the series (e.g., Telephone Number in Part 3).

Some fields can also be analyzed using partial values (e.g., in Part 3, Birth Year was extracted from Birth Date) or a combination of values and formats (e.g., in Part 6, Account Number had an alpha prefix followed by all numbers).

Free-form fields are often easier to analyze as formats constructed by parsing and classifying the individual values within the field.  This analysis technique is often necessary since not only is the cardinality of free-form fields usually very high, but they also tend to have a very high distinctness (i.e., the exact same field value rarely occurs on more than one record). 

Additionally, the most frequently occurring formats for free-form fields will often collectively account for a large percentage of the records with an actual value in the field.  Examples of free-form field analysis were the focal points of Part 5 and Part 7.

We also saw examples of how valid values in a valid format can have an invalid context (e.g., in Part 3, Birth Date values set in the future), as well as how valid field formats can conceal invalid field values (e.g., Telephone Number in Part 3).

Part 3 also provided examples (in both Telephone Number and E-mail Address) of how you should not mistake completeness (which as a data profiling statistic indicates a field is populated with an actual value) for an indication the field is complete in the sense that its value contains all of the sub-values required to be considered valid. 

 

Drill-down Analysis

A data profiling tool will also provide the capability to drill-down on its statistical summaries and frequency distributions in order to perform a more detailed review of records of interest.  Drill-down analysis will often provide useful data examples to share with subject matter experts.

Performing a preliminary analysis on your data prior to engaging in these discussions better facilitates meaningful dialogue because real-world data examples better illustrate actual data usage.  As stated earlier, understanding your data is essential to using it effectively and improving its quality.

Various examples of drill-down analysis were used throughout the series.  However, drilling all the way down to the record level was shown in Part 2 (Gender Code), Part 4 (City Name), and Part 6 (Account Number and Tax ID).

 

Conclusion

Fundamentally, this series posed the following question: What can just your analysis of data tell you about it?

Data profiling is typically one of the first tasks performed on a data quality initiative.  I am often told to delay data profiling until business requirements are documented and subject matter experts are available to answer my questions. 

I always disagree – and begin data profiling as soon as possible.

I can do a better job of evaluating business requirements and preparing for meetings with subject matter experts after I have spent some time looking at data from a starting point of blissful ignorance and curiosity.

Ultimately, I believe the goal of data profiling is not to find answers, but instead, to discover the right questions.

Discovering the right questions is a critical prerequisite for effectively discussing data usage, relevancy, standards, and the metrics for measuring and improving quality.  All of which are necessary in order to progress from just profiling your data, to performing a full data quality assessment (which I will cover in a future series on this blog).

A data profiling tool can help you by automating some of the grunt work needed to begin your analysis.  However, it is important to remember that the analysis itself can not be automated – you need to review the statistical summaries and frequency distributions generated by the data profiling tool and more important translate your analysis into meaningful reports and questions to share with the rest of your team. 

Always remember that well performed data profiling is both a highly interactive and a very iterative process.

 

Thank You

I want to thank you for providing your feedback throughout this series. 

As my fellow Data Gazers, you provided excellent insights and suggestions via your comments. 

The primary reason I published this series on my blog, as opposed to simply writing a whitepaper or a presentation, was because I knew our discussions would greatly improve the material.

I hope this series proves to be a useful resource for your actual adventures in data profiling.

 

The Complete Series


Recently Read: November 28, 2009

Recently Read is an OCDQ regular segment.  Each entry provides links to blog posts, articles, books, and other material I found interesting enough to share.  Please note “recently read” is literal – therefore what I share wasn't necessarily recently published.

 

Data Quality Blog Posts

For simplicity, “Data Quality” also includes Data Governance, Master Data Management, and Business Intelligence.

 

Social Media Blog Posts

For simplicity, “Social Media” also includes Blogging, Social Networking, and Online Marketing.

 

Book Quotes

An eclectic list of quotes from some recently read (and/or simply my favorite) books.

  • From The Wisdom of Crowds by James Surowiecki – “Refuse to allow the merit of an idea to be determined by the status of the person advocating it.”

     

  • From Purple Cow by Seth Godin – “We mistakenly believe that criticism leads to failure.”

     

  • From How We Decide by Jonah Lehrer – “The best decision-makers don't despair.  Instead, they become students of error, determined to learn from what went wrong.”

     

  • From The Whuffie Factor by Tara Hunt – “Whuffie is the residual outcome—the currency—of your reputation.  You lose or gain it based on positive or negative actions, your contributions to the community, and what people think of you.”

     

  • From Trust Agents by Chris Brogan and Julien Smith – “You accrue social capital as a side benefit of doing good, but doing good by itself is its own reward.”

Commendable Comments (Part 4)

Thanksgiving

Photo via Flickr (Creative Commons License) by: ella_marie 

Today is Thanksgiving Day, which is a United States holiday with a long and varied history.  The most consistent themes remain family and friends gathering together to share a large meal and express their gratitude.

This is the fourth entry in my ongoing series for expressing my gratitude to my readers for their truly commendable comments on my blog posts.  Receiving comments is the most rewarding aspect of my blogging experience.  Although I am truly grateful to all of my readers, I am most grateful to my commenting readers. 

 

Commendable Comments

On Days Without A Data Quality Issue, Steve Sarsfield commented:

“Data quality issues probably occur on some scale in most companies every day.  As long as you qualify what is and isn't a data quality issue, this gets back to what the company thinks is an acceptable level of data quality.

I've always advocated aggregating data quality scores to form business metrics.  For example, what data quality metrics would you combine to ensure that customers can always be contacted in case of an upgrade, recall or new product offering?  If you track the aggregation, it gives you more of a business feel.”

On Customer Incognita, Daragh O Brien commented:

“Back when I was with the phone company I was (by default) the guardian of the definition of a 'Customer'.  Basically I think they asked for volunteers to step forward and I was busy tying my shoelace when the other 11,000 people in the company as one entity took a large step backwards.

I found that the best way to get a definition of a customer was to lock the relevant stakeholders in a room and keep asking 'What' and 'Why'. 

My 'data modeling' methodology was simple.  Find out what the things were that were important to the business operation, define each thing in English without a reference to itself, and then we played the 'Yes/No Game Show' to figure out how that entity linked to other things and what the attributes of that thing were.

Much to IT's confusion, I insisted that the definition needed to be a living thing, not carved in two stone tablets we'd lug down from on top of the mountain. 

However, because of the approach that had been taken we found that when new requirements were raised (27 from one stakeholder), the model accommodated all of them either through an expansion of a description or the addition of a piece of reference data to part of the model.

Fast-forward a few months from the modeling exercise.  I was asked by IT to demo the model to a newly acquired subsidiary.  It was a significantly different business.  I played the 'Yes/No Game Show' with them for a day.  The model fitted their needs with just a minor tweak. 

The IT team from the subsidiary wanted to know how had I gone about normalizing the data to come up with the model, which is kind of like cutting up a perfectly good apple pie to find out how what an apple is and how to make pastry.

What I found about the 'Yes/No Game Show' approach was that it made people open up their thinking a bit, but it took some discipline and perseverance on my part to keep asking what and why.  Luckily, having spent most of the previous few years trying to get these people to think seriously about data quality they already thought I was a moron so they were accommodating to me.

A key learning for me out of the whole thing is that, even if you are doing a data management exercise for a part of a larger business, you need to approach it in a way that can be evolved and continuously improved to ensure quality across the entire organization. 

Also, it highlighted the fallacy of assuming that a company can only have one kind of customer.”

On The Once and Future Data Quality Expert, Dylan Jones commented:

“I recently attended a conference and sat in on a panel that discussed some of the future trends, such as cloud computing.  It was a great discussion, highly polarized, and as I came home I thought about how far we've come as a profession but more importantly, how much more there is to do.

The reality is that the world is changing, the volumes of data held by businesses are immense and growing exponentially, our desire for new forms of information delivery insatiable, and the opportunities for innovation boundless.

I really believe we're not innovating as an industry anything like we should be.  The cloud, as an example, offers massive opportunities for a range of data quality services but I've certainly not read anything in the media or press that indicates someone is capitalizing on this.

There are a few recent data quality technology innovations which have caught my eye, but I also think there is so much more vendors should be doing.

On the personal side of the profession, I think online education is where we're headed.  The concept of localized training is now being replaced by online learning.  With the Internet you can now train people on every continent, so why aren't more people going down this route?

I find it incredibly ironic when I speak to data quality specialists who admit that 'they don't have the first clue about all this social media stuff.'  This is the next generation of information management, it's here right now, they should be embracing it.  I think if you're a 'guru' author, trainer or consultant you need to think of new ways to engage with your clients/trainees using the tools available.

What worries me is that the growth of information doesn't match the maturity and growth of our profession.  For example, we really need more people who can articulate the value of what we can offer. 

Ted Friedman made a great point on Twitter recently when he talked about how people should stop moaning about executives that 'don't get it' and instead focus on improving ways to demonstrate the value of data quality improvement.

Just because we've come a long way doesn't mean we know it all, there is still a hell of a long way to go.”

Thanks for giving your comments

Thank you very much for giving your comments and sharing your perspectives with our collablogaunity.  Since there have been so many commendable comments, please don't be offended if your commendable comment hasn't been featured yet. 

Please keep on commenting and stay tuned for future entries in the series. 

 

Related Posts

Commendable Comments (Part 1)

Commendable Comments (Part 2)

Commendable Comments (Part 3)

DQ-Tip: “Data quality is about more than just improving your data...”

Data Quality (DQ) Tips is an OCDQ regular segment.  Each DQ-Tip is a clear and concise data quality pearl of wisdom.

“Data quality is about more than just improving your data.

Ultimately, the goal is improving your organization.”

This DQ-Tip is from Tony Fisher's great book The Data Asset: How Smart Companies Govern Their Data for Business Success.

In the book, Fisher explains that one of the biggest mistakes organizations make is not viewing their data as a corporate asset.  This common misconception often prevents data quality from being rightfully viewed a critical priority. 

Data quality is misperceived to be an activity performed just for the sake of improving data.  When in fact, data quality is an activity performed for the sake of improving business processes.

“Better data leads to better decisions,” explains Fisher, “which ultimately leads to better business.  Therefore, the very success of your organization is highly dependent on the quality of your data.”

 

Related Posts

DQ-Tip: “...Go talk with the people using the data”

DQ-Tip: “Data quality is primarily about context not accuracy...”

DQ-Tip: “Don't pass bad data on to the next person...”

Brevity is the Soul of Social Media

“Why day is day, night night, and time is time,
Were nothing but to waste night, day and time.
Therefore, since brevity is the soul of wit,
I will be brief ...”

Within the wide world of social media, one of the most common features is some form of social networking, microblogging, or short message service that allow users to share brief status updates.  Some social media sites are almost entirely built on only this feature (e.g., Twitter) whereas others (e.g., Facebook, LinkedIn) include it among a list of many other features. 

Either way, these status updates have created a rather pithy platform many people argue is incompatible with meaningful communication, especially of a professional nature.  I must admit this was also my initial opinion of social media.

However, I now believe not only is it the soul of wit, brevity is the soul of social media – and, in fact, a very good soul.

 

Short Attention Span Theater

I doubt attention deficit will still be considered a disorder ten years from now.  We are living increasingly faster-paced lives in an increasingly faster-paced world.  The pervasiveness of the Internet and the rapid proliferation of powerful mobile technology is making our world a smaller and smaller place and our lives a more and more crowded space. 

We have become so accustomed to multi-tasking that the very concept of focusing our attention on only one thing at a time somehow seems inherently wrong to us.  All the world's a stage within this short attention span theater.  And all of us are not merely players, we have been cast in several simultaneous roles.

Time management has always been important, but nowadays it is even more essential.  This is especially true when it comes to social media, which, if we can effectively and efficiently use it, has great personal and professional potential.  Amber Naslund recently provided an excellent blog series on social media time management that I highly recommend.

 

The Power of Pith

I admit I am a long-winded talker or, as a favorite (canceled) television show would say, “conversationally anal-retentive.”  In the past (slightly less now), I was also known for e-mail messages even Leo Tolstoy would declare to be far too long.

Therefore, it may be surprising to learn I am addicted to Twitter.  How could I possibly constrain myself to only 140 characters?  No, I don't use ellipses to extend my thoughts across multiple tweets (although I admit I am often tempted to do so). 

I wholeheartedly agree with Jennifer Blanchard, who explained how Twitter makes you a better writer.  When forced to be concise, you have to focus on exactly what you want to say, using as few words as possible. 

The power of pith means reducing your message to its bare essence.  In order to engage in effective dialogue on the stage of our short attention span theater, this is a required skill we all must master – and not just when we are on Twitter.

For those who argue this simply regresses human communication back to our days of monosyllabic grunting, I invite you to read the excellent recent blog post Is Twitter a Complex Adaptive System? written by Venessa Miemis

Although you should read all of it, the point I need here will be found under Insight #4 toward the end of the post.  Miemis shares a study that reveals using Twitter can not only improve communication, but actually build intelligence. 

The collaborative communication enabled by social media platforms can actually contribute to a growing collective intelligence made up of all of us.  The power of pith is the wisdom of crowds.

 

Blogging with Brevity

Brevity is the soul of all social media and yes, this includes blogging as well.  Some view blogging as social media's last bastion of robust communication.  You can take your time and use all the words you want on your blog, right?  Sure, as long as you have no interest in anyone actually reading your blog.

Some bloggers get cranky with me when I emphasize the Three C’s – meaning your blog posts should be:

  1. Clear – Get to the point and stay on point
  2. Concise – No longer than necessary
  3. Consumable – Formatted to be easily read on a computer screen

Concise is usually the main cranky causing culprit because everyone interprets it to mean “write really short posts.” 

One blogger told me he has “never met a subordinate clause he didn't like,” thereby expressing his fondness for writing compound-complex sentences.  For the non-writers, this means really long (but grammatically correct) sentences oftentimes requiring you to read them three or four times before truly comprehending their full meaning.

Don't get me wrong.  This particular blogger is an incredibly gifted writer known for his absolutely brilliant blog posts.  My only true criticism of his writing style is it truly requires a significant time commitment.

Michelle Russell does a great job explaining how to write with a knife.  No, not literally.  Writing with a knife means writing for yourself, but editing for your readers.  Editing is the hardest part of writing, but also the most important. 

Blogging with brevity doesn't necessarily mean “write really short posts.”  Being concise simply means taking out anything that doesn't need to be included.  For example, you really didn't need to read the additional jokes and Shakespearean references included in the first draft of this post.

 

The Future of Brevity is Bright

Some predict the size limits of message service standards and status updates will be increased.  Others predict new social media platforms will be based on different paradigms.  Either way, innovation will eventually deliver an ability to be more verbose.

However, barring some major scientific breakthrough (or some major breakdown in the space-time continuum), there will still only be 24 hours in a day.  Therefore, no matter what happens, I am certain the future of brevity is bright.

Neither the world nor people in it are likely to slow down.  Our attention spans will remain short.  Our time management skills will remain vigilant.  We will communicate through the power of pith, brevity will remain the soul of both wit and social media, and hopefully, we will all “live long and prosper.”

 

Related Posts

The Mullet Blogging Manifesto

Collablogaunity

Podcast: Your Blog, Your Voice

Collablogaunity

The meteoric rise of the Internet coupled with social media has created an amazing medium that is enabling people who are separated by vast distances and disparate cultures to come together, communicate, and collaborate in ways few would have thought possible just a few decades ago.  Blogging, especially when effectively integrated with social networking, can be one of the most powerful aspects of social media.

The great advantage to blogging as a medium, as opposed to books, newspapers, magazines, and even presentations, is that blogging is not just about broadcasting a message. 

This is not to say that books, newspapers, and magazines aren't useful (they certainly can be) or that presentations lack an interactive component (they certainly should not).  I simply believe that, when done well, blogging better facilities effective communication by starting a conversation, encouraging collaboration, and fostering a true sense of community.

Mashing together the words collaboration, blog, and community, I use the term collablogaunity — which is pronounced “Call a Blog a Unity” — to describe how remarkable blogs do this remarkably well.

 

Conversation

Blogging is a conversation — with your readers. 

I love the sound of my own voice and I talk to myself all the time (even in public).  However, the two-way conversation that blogging provides via comments from my readers greatly improves the quality of my blog content —  because it helps me better appreciate the difference between what I know and what I only think I know.

Without comments, the conversation is only one way.  Engaging readers in dialogue and discussion allows some of your points to be made for you by those who take the time to comment as opposed to you just telling everyone how you see the world.

Blogging isn't about using the Internet as your own personal bullhorn for broadcasting your message.  In her wonderful book The Whuffie Factor, Tara Hunt explains that you really need to:

“Turn the bullhorn around: stop talking, start listening, and create continuous conversations.”

Respond to the comments you receive (but never feed the troll).  You don't have to respond immediately.  Sometimes, the conversation will go more smoothly without your involvement as your readers talk amongst themselves.  Other times, your response will help continue the conversation and encourage participation from others. 

Always demonstrate that feedback is both welcome and appreciated.  Make sure to never talk down to your readers (either in your blog post or your comment responses).  It is perfectly fine to disagree and debate, just don't denigrate.  

In a recent guest post on ProBlogger, Rob McPhillips explained: 

“If instead, you are all the time only seeking praise and approval from everyone, then there is nothing solid, consistent or certain about your blog and so ultimately it will never gather a sizeable core of die hard fans.  Only drive by readers who scan a post and never look back.” 

Collaboration

Blogging is a collaboration — with other bloggers.

While conversation is primarily between you and your readers, collaboration is primarily between you and other bloggers.  Although you may be inclined to view other bloggers as “the competition,” especially those within your own niche, this would be a mistake.  Yes, it is true that blogs are competing with each other for readers.  However, sustainable success is achieved through collaboration and friendly competition with your peers.

Brian Clark has explained in the past and continues to exemplify that strategic collaboration is the secret to 21st century success.  Clark has stated that if he had to reduce his recipe for success to just three ingredients, it would be content, copywriting, and collaboration.  And if he had to give up two of those, then he'd keep collaboration.

In their terrific book Trust Agents, Chris Brogan and Julien Smith explain that although people in most cultures view themselves as the central hero in their life's story, the reality is that you need to build an army because you can't do it all alone.

Collaboration between bloggers is mainly about networking and cross-promotion.  You should network with other bloggers, especially those within your own niche.  This can be accomplished a number of ways including e-mail introductions, Twitter direct messages (if the other blogger is following you), LinkedIn connection requests, or Facebook friend requests.

As with any networking, the most important thing is being genuine.  As Darren Rowse and Chris Garrett explained in their highly recommended ProBlogger book, when you network with other bloggers, keep it real, be specific, keep it brief without being rude, and explain why you are interested in connecting.  They rightfully emphasize the importance of that last point.

As we all know, although content may be king, marketing is queen.  Networking with other bloggers can help you get the word out about your brilliant blog and its penchant for publishing posts that everyone must read.  Adding other bloggers to your blogroll, linking to their posts when applicable to your content, and leaving meaningful comments on their posts are not only recommended best practices of netiquette, they are also just the right thing to do.

Too many bloggers have a selfish networking and marketing strategy.  They only promote their own content and then wonder why nobody reads their blog.  I am fond of referring to all social media as Social Karma.  Focus on helping other bloggers promote their content and they will likely be more willing to return the favor.  However, don't misunderstand this technique to be a pathetic peer pressure tactic in other words, I re-tweeted your blog post, why didn't you re-tweet my blog post?

One last point on collaboration is to set realistic expectations — for others and for yourself.  You should definitely try to help others when you can.  However, you simply can't help everyone.  Don't let people take advantage of your generosity. 

Politely, but firmly, say no when you need to say no.  Also extend the same courtesy to other people when they turn you down (or simply ignore you) when you try to connect with them or when you ask them for their help. 

Mean and selfish people definitely suck.  But let's face it, nobody's perfect — we all have bad days, we all occasionally say and do stupid things, and we all occasionally treat people worse than they deserve to be treated.  So don't be too hard on people when they disappoint you, because tomorrow it will probably be your turn to have a bad day.

 

Community

Blogging is a community service.

If you truly believe and actually practice the principles of both conversation and collaboration, then viewing blogging as a community service comes naturally.  You will truly be more interested in actually listening to what your readers have to say, and less interested in just broadcasting your message.  You will see your words as simply the catalyst that gets the conversation started, and when necessary, helps continue the discussion. 

You will see friends not foes when encountering your blogging peers.  You will help them celebrate their successes and quickly recover from their failures.  You will help others when you can and without worrying about what's in it for you.

As James Chartrand says, you will welcome people to your blog because you view blogging as a festival of people, a community strengthened by people, where everyone can speak up with great care and attention, sharing thoughts and views while openly accepting differing opinions.  Blogging is a community service providing a wealth of experience, thoughts and knowledge being shared by all sorts of participants.

In the closing keynote of this year's BlogWorld conference, Chris Brogan explained (from notes taken by David B. Thomas):

“Make it about them.  Stop looking at this as a cult of me. 

It has to be about your audience.  Turn them into a community. 

The difference between an audience and a community is the way you face the chairs. 

The difference between an audience and a community:

One will fall on its sword for you and the other will watch you fall.”

Collablogaunity

Pronounced: “Call a Blog a Unity”

There are literally millions of blogs on the Internet today.  Your blog (to quote Seth Godin) is “either remarkable or invisible.”

Remarkable blogs primarily do three things:

  1. Start conversations
  2. Encourage collaboration
  3. Foster a true sense of community

Remarkable blogs are collablogaunities.  Is your blog a collablogaunity?

 

Related Posts

The Mullet Blogging Manifesto

Brevity is the Soul of Social Media

Podcast: Your Blog, Your Voice

Beyond a “Single Version of the Truth”

This post is involved in a good-natured contest (i.e., a blog-bout) with two additional bloggers: Henrik Liliendahl Sørensen and Charles Blyth.  Our contest is a Blogging Olympics of sorts, with the United States, Denmark, and England competing for the Gold, Silver, and Bronze medals in an event we are calling “Three Single Versions of a Shared Version of the Truth.” 

Please take the time to read all three posts and then vote for who you think has won the debate (see poll below).  Thanks!

 

The “Point of View” Paradox

In the early 20th century, within his Special Theory of Relativity, Albert Einstein introduced the concept that space and time are interrelated entities forming a single continuum, and therefore the passage of time can be a variable that could change for each individual observer.

One of the many brilliant insights of special relativity was that it could explain why different observers can make validly different observations – it was a scientifically justifiable matter of perspective. 

It was Einstein's apprentice, Obi-Wan Kenobi (to whom Albert explained “Gravity will be with you, always”), who stated:

“You're going to find that many of the truths we cling to depend greatly on our own point of view.”

The Data-Information Continuum

In the early 21st century, within his popular blog post The Data-Information Continuum, Jim Harris introduced the concept that data and information are interrelated entities forming a single continuum, and that speaking of oneself in the third person is the path to the dark side.

I use the Dragnet definition for data – it is “just the facts” collected as an abstract description of the real-world entities that the enterprise does business with (e.g., customers, vendors, suppliers).

Although a common definition for data quality is fitness for the purpose of use, the common challenge is that data has multiple uses – each with its own fitness requirements.  Viewing each intended use as the information that is derived from data, I define information as data in use or data in action.

Quality within the Data-Information Continuum has both objective and subjective dimensions.  Data's quality is objectively measured separate from its many uses, while information's quality is subjectively measured according to its specific use.

 

Objective Data Quality

Data quality standards provide a highest common denominator to be used by all business units throughout the enterprise as an objective data foundation for their operational, tactical, and strategic initiatives. 

In order to lay this foundation, raw data is extracted directly from its sources, profiled, analyzed, transformed, cleansed, documented and monitored by data quality processes designed to provide and maintain universal data sources for the enterprise's information needs. 

At this phase of the architecture, the manipulations of raw data must be limited to objective standards and not be customized for any subjective use.  From this perspective, data is now fit to serve (as at least the basis for) each and every purpose.

 

Subjective Information Quality

Information quality standards (starting from the objective data foundation) are customized to meet the subjective needs of each business unit and initiative.  This approach leverages a consistent enterprise understanding of data while also providing the information necessary for day-to-day operations.

But please understand: customization should not be performed simply for the sake of it.  You must always define your information quality standards by using the enterprise-wide data quality standards as your initial framework. 

Whenever possible, enterprise-wide standards should be enforced without customization.  The key word within the phrase “subjective information quality standards” is standards — as opposed to subjective, which can quite often be misinterpreted as “you can do whatever you want.”  Yes you can – just as long as you have justifiable business reasons for doing so.

This approach to implementing information quality standards has three primary advantages.  First, it reinforces a consistent understanding and usage of data throughout the enterprise.  Second, it requires each business unit and initiative to clearly explain exactly how they are using data differently from the rest of your organization, and more important, justify why.  Finally, all deviations from enterprise-wide data quality standards will be fully documented. 

 

The “One Lie Strategy”

A common objection to separating quality standards into objective data quality and subjective information quality is the enterprise's significant interest in creating what is commonly referred to as a “Single Version of the Truth.”

However, in his excellent book Data Driven: Profiting from Your Most Important Business Asset, Thomas Redman explains:

“A fiendishly attractive concept is...'a single version of the truth'...the logic is compelling...unfortunately, there is no single version of the truth. 

For all important data, there are...too many uses, too many viewpoints, and too much nuance for a single version to have any hope of success. 

This does not imply malfeasance on anyone's part; it is simply a fact of life. 

Getting everyone to work from a single version of the truth may be a noble goal, but it is better to call this the 'one lie strategy' than anything resembling truth.”

Beyond a “Single Version of the Truth”

In the classic 1985 film Mad Max Beyond Thunderdome, the title character arrives in Bartertown, ruled by the evil Auntie Entity, where people living in the post-apocalyptic Australian outback go to trade for food, water, weapons, and supplies.  Auntie Entity forces Mad Max to fight her rival Master Blaster to the death within a gladiator-like arena known as Thunderdome, which is governed by one simple rule:

“Two men enter, one man leaves.”

I have always struggled with the concept of creating a “Single Version of the Truth.”  I imagine all of the key stakeholders from throughout the enterprise arriving in Corporatetown, ruled by the Machiavellian CEO known only as Veritas, where all business units and initiatives must go to request funding, staffing, and continued employment.  Veritas forces all of them to fight their Master Data Management rivals within a gladiator-like arena known as Meetingdome, which is governed by one simple rule:

“Many versions of the truth enter, a Single Version of the Truth leaves.”

For any attempted “version of the truth” to truly be successfully implemented within your organization, it must take into account both the objective and subjective dimensions of quality within the Data-Information Continuum. 

Both aspects of this shared perspective of quality must be incorporated into a “Shared Version of the Truth” that enforces a consistent enterprise understanding of data, but that also provides the information necessary to support day-to-day operations.

The Data-Information Continuum is governed by one simple rule:

“All validly different points of view must be allowed to enter,

In order for an all encompassing Shared Version of the Truth to be achieved.”

 

You are the Judge

This post is involved in a good-natured contest (i.e., a blog-bout) with two additional bloggers: Henrik Liliendahl Sørensen and Charles Blyth.  Our contest is a Blogging Olympics of sorts, with the United States, Denmark, and England competing for the Gold, Silver, and Bronze medals in an event we are calling “Three Single Versions of a Shared Version of the Truth.” 

Please take the time to read all three posts and then vote for who you think has won the debate.  A link to the same poll is provided on all three blogs.  Therefore, wherever you choose to cast your vote, you will be able to view an accurate tally of the current totals. 

The poll will remain open for one week, closing at midnight on November 19 so that the “medal ceremony” can be conducted via Twitter on Friday, November 20.  Additionally, please share your thoughts and perspectives on this debate by posting a comment below.  Your comment may be copied (with full attribution) into the comments section of all of the blogs involved in this debate.

 

Related Posts

Poor Data Quality is a Virus

The General Theory of Data Quality

The Data-Information Continuum

The Once and Future Data Quality Expert

World Quality Day 2009

Wednesday, November 11 is World Quality Day 2009.

World Quality Day was established by the United Nations in 1990 as a focal point for the quality management profession and as a celebration of the contribution that quality makes to the growth and prosperity of nations and organizations.  The goal of World Quality Day is to raise awareness of how quality approaches (including data quality best practices) can have a tangible effect on business success, as well as contribute towards world-wide economic prosperity.

 

IAIDQ

The International Association for Information and Data Quality (IAIDQ) was chartered in January 2004 and is a not-for-profit, vendor-neutral professional association whose purpose is to create a world-wide community of people who desire to reduce the high costs of low quality information and data by applying sound quality management principles to the processes that create, maintain and deliver data and information.

Since 2007 the IAIDQ has celebrated World Quality Day as a springboard for improvement and a celebration of successes.  Please join us to celebrate World Quality Day by participating in our interactive webinar in which the Board of Directors of the IAIDQ will share with you stories and experiences to promote data quality improvements within your organization.

In my recent Data Quality Pro article The Future of Information and Data Quality, I reported on the IAIDQ Ask The Expert Webinar with co-founders Larry English and Tom Redman, two of the industry pioneers for data quality and two of the most well-known data quality experts.

 

Data Quality Expert

As World Quality Day 2009 approaches, my personal reflections are focused on what the title data quality expert has meant in the past, what it means today, and most important, what it will mean in the future.

With over 15 years of professional services and application development experience, I consider myself to be a data quality expert.  However, my experience is paltry by comparison to English, Redman, and other industry luminaries such as David Loshin, to use one additional example from many. 

Experience is popularly believed to be the path that separates knowledge from wisdom, which is usually accepted as another way of defining expertise. 

Oscar Wilde once wrote that “experience is simply the name we give our mistakes.”  I agree.  I have found that the sooner I can recognize my mistakes, the sooner I can learn from the lessons they provide, and hopefully prevent myself from making the same mistakes again. 

The key is early detection.  As I gain experience, I gain an improved ability to more quickly recognize my mistakes and thereby expedite the learning process.

James Joyce wrote that “mistakes are the portals of discovery” and T.S. Eliot wrote that “we must not cease from exploration and the end of all our exploring will be to arrive where we began and to know the place for the first time.”

What I find in the wisdom of these sages is the need to acknowledge the favor our faults do for us.  Therefore, although experience is the path that separates knowledge from wisdom, the true wisdom of experience is the wisdom of failure.

As Jonah Lehrer explained: “Becoming an expert just takes time and practice.  Once you have developed expertise in a particular area, you have made the requisite mistakes.”

But expertise in any discipline is more than simply an accumulation of mistakes and birthdays.  And expertise is not a static state that once achieved, allows you to simply rest on your laurels.

In addition to my real-world experience working on data quality initiatives for my clients, I also read all of the latest books, articles, whitepapers, and blogs, as well as attend as many conferences as possible.

 

The Times They Are a-Changin'

Much of the discussion that I have heard regarding the future of the data quality profession has been focused on the need for the increased maturity of both practitioners and organizations.  Although I do not dispute this need, I am concerned about the apparent lack of attention being paid to how fast the world around us is changing.

Rapid advancements in technology, coupled with the meteoric rise of the Internet and social media (blogs, wikis,  Twitter, Facebook, LinkedIn, etc.) has created an amazing medium that is enabling people separated by vast distances and disparate cultures to come together, communicate, and collaborate in ways few would have thought possible just a few decades ago. 

I don't believe that it is an exaggeration to state that we are now living in an age where the contrast between the recent past and the near future is greater than perhaps it has ever been in human history.  This brave new world has such people and technology in it, that practically every new day brings the possibility of another quantum leap forward.

Although it has been argued by some that the core principles of data quality management are timeless, I must express my doubt.  The daunting challenges of dramatically increasing data volumes and the unrelenting progress of cloud computing, software as a service (SaaS), and mobile computing architectures, would appear to be racing toward a high-speed collision with our time-tested (but time-consuming to implement properly) data quality management principles.

The times they are indeed changing and I believe we must stop using terms like Six Sigma and Kaizen as if they were a shibboleth.  If these or any other disciplines are to remain relevant, then we must honestly assess them in the harsh and unforgiving light of our brave new world that is seemingly changing faster than the speed of light.

Expertise is not static.  Wisdom is not timeless.  The only constant is change.  For the data quality profession to truly mature, our guiding principles must change with the times, or be relegated to a past that is all too quickly becoming distant.

 

Share Your Perspectives

In celebration of World Quality Day, please share your perspectives regarding the past, present, and most important, the future of the data quality profession.  With apologies to T. H. White, I declare this debate to be about the difference between:

The Once and Future Data Quality Expert

Related Posts

Mistake Driven Learning

The Fragility of Knowledge

The Wisdom of Failure

A Portrait of the Data Quality Expert as a Young Idiot

The Nine Circles of Data Quality Hell

 

Additional IAIDQ Links

IAIDQ Ask The Expert Webinar: World Quality Day 2009

IAIDQ Ask The Expert Webinar with Larry English and Tom Redman

INTERVIEW: Larry English - IAIDQ Co-Founder

INTERVIEW: Tom Redman - IAIDQ Co-Founder

IAIDQ Publications Portal

The Mullet Blogging Manifesto

Blogging is more art than science.  My personal blogging style can perhaps best be described as mullet blogging.  No, not the “business in the front, party in the back” haircut that I tried to rock back in the '80s (I couldn't pull it off, had to settle for a “tail” and had to cut that off because it made me look like an idiot – OK, more idiotic than usual).  By mullet blogging I mean:

“Take yourself and your blog seriously, but still have a sense of humor about both.”

As a mullet blogger, I hold the following truths to be self-evident, but I decided to write them down anyway.

 

Blogging is All about You

Not you meaning me, the blogger — you meaning you, the reader.

Blogging should always focus on the reader and provide them assistance with a specific problem, even if that problem is boredom or simply a need for entertainment.  Don't worry about your readers agreeing with you.  They will either thank you for your help or tell you that you're an idiot – either way, you have started a conversation, which should always be your blogging goal.

Brian Clark recently shared something to think about using the following quote from Robert McKee:

“When talented people write badly it’s generally for one of two reasons:

Either they’re blinded by an idea they feel compelled to prove,

Or they’re driven by an emotion they must express.

When talented people write well, it is generally for this reason:

They’re moved by a desire to touch the audience.”

B = U2C3

Blogging = Unique and Useful content that is Clear, Concise, and Consumable.

The conventional blogging wisdom is to be both Unique and Useful.  Although I normally like to defy conventions, I have to agree with the wise ones on these fundamentals.

One of the most important aspects of being unique is writing effective titles.  Most potential readers scan titles to determine whether or not they will click and read more.  There is obviously a delicate balance between effective titles and “baiting,” which will only alienate potential readers. 

If you write a compelling title that makes me click through to an interesting post, then “You Rock!”  However, if you write a “Shock and Awe” title followed by “Aw Shucks” content, then “You Suck!” 

Therefore, your content also has to be unique – your topic, position, voice, or a combination of all three.

One of the most important aspects of useful is “infotainment” – that combination of information and entertainment that, when done well, can turn potential readers into raving fans.  Just don't forget about the previous section – your content has to be informative and entertaining to your readers.

The key to good blogging is to follow the Three C’s – Clear, Concise, Consumable

The attention span of a blog reader is not the same as a reader of books, newspapers (they still exist, right?), magazine articles, or the audience for presentations.  Most people only scan blogs, rarely read a full post and even more rarely leave a comment – regardless of how well the blog post is written. 

Write blog posts that get to the point and stay on point (i.e., clear), are no longer than they need to be (i.e., concise), and are formatted to be easy to read on a computer screen (i.e., consumable).

 

Laugh, Think, Comment

The three things that you want your readers to do.

Although it is not as blatantly formulaic as the title of the previous section, here is another method to my blogging madness:

  1. Open with a joke
  2. Say something thought provoking
  3. End with a call to action

It's as easy as 1-2-3!  In my defense, I didn't say open with a good joke.  But seriously, humor can be a great way to start a conversation and hold your readers' attention for those few precious additional seconds while you are getting to your point.  Obviously, there will be times when the seriousness of your subject would make comedy inappropriate, and if you are not naturally inclined to use humor, then you shouldn't try to force it.

Thought provoking content doesn't have to mean deep thoughts.  There is no need to channel Jean-Paul Sartre, for example.  However, to paraphrase Sartre: “Hell is other people's boring blogs.”

Obviously, comments are not the only type of call to action.  However, blogging is a conversation facilitated by the dialogue and discussion provided via comments from your readers.  Without comments, the conversation is only one way. 

I love the sound of my own voice and I talk to myself all the time (even in public).  However, the two-way conversation provided via comments not only greatly improves the quality of my blog content — much more importantly, it helps me better appreciate the difference between what I know and what I only think I know.

As Darren Rowse and Chris Garrett explained in their highly recommended ProBlogger book: “even the most popular blogs tend to attract only about a 1 percent commenting rate.”  Therefore, don't be too disappointed if you are not getting many comments.  Take that statistic as a challenge to motivate you to write blog posts that your readers simply can not resist commenting on. 

Respond to the comments you do receive.  This continues the two-way conversation and encourages comments from other readers.  Make sure to never talk down to your readers (either in your blog post or your comment responses).  It is perfectly fine to disagree and debate, just don't denigrate. 

Obviously, you should block all spam (leading argument for using comment moderation) and never feed the troll.

 

Stories and Metaphors and Analogies!  Oh, my!

I've a feeling we're not in Kansas anymore.  Especially me, since I live in Iowa.

Darren Rowse recently shared some great tips about why stories are an effective communication tool for your blog, including a list of some of the different types of stories you can tell.

My blog uses a lot of metaphors and analogies (and sometimes just plain silliness) in an attempt to make my posts more interesting.  This is necessary because I write about a niche topic, which although important, is also rather dull.

James Chartrand uses the term Method Blogging as (yes, you guessed it) a metaphor for blogging by comparing it to method acting.  Try experimenting with different styles like an actor experimenting with different types of roles and movie genres. 

Oftentimes, using stories, metaphors, and analogies in my content works very well.  But I admit, sometimes it simply sucks. 

However, I have never been afraid to look like an idiot.  After all, we idiots are important members of society – we make everyone else look smart by comparison.

 

The King, Queen, and Crown Prince of Blogging

Meet the Blogging Royal Family: Content, Marketing, and Context.

Content is King.  The primary reason that people are (or aren't) reading your blog is because of your content.

Marketing is Queen.  “If you blog it, they will read.” Ah, no they won't — this ain't Field of DreamsSome of the best written blogs on the Series of Tubes get hardly any love because they get hardly any marketing.  In addition to providing RSS and e-mail feeds, I use social media (e.g., Twitter, Facebook, LinkedIn) to promote my blog content.

However, too many bloggers have a selfish social media strategy.  Don't use it exclusively for self-promotion.  View social media as Social Karma.  Focus on helping others and you will get much more back than just a blog reader, a LinkedIn connection, a Twitter follower, or a Facebook friend.  In addition to blog promotion (which is important), I use social media to listen, to learn, and to help others when I can.

Larry Brooks recently explained that although content may still be king, at the very least, you must pay homage to the new Crown Prince — Context.  To paraphrase Brooks, context comes from clarity about your blogging goals, juxtaposed against the expectations and tolerances of your readers.  Basically, this above all: to thine own readers be true.

 

Emerson on Blogging

“Nothing can bring you peace but yourself.”

One of my favorite writers is Ralph Waldo Emerson.  The quote that started this section was pure Emerson.  What follows is a slight paraphrasing of one of my all-time favorite passages, which comes from his essay on Self-Reliance:

“What I must do is all that concerns me, not what the people think.  This rule, equally arduous in real and in online life, may serve for the whole distinction between greatness and meanness.  It is the harder because you will always find those who think they know what is your duty better than you know it.  It is easy in the world to live after the world's opinion; it is easy in solitude to live after our own; but the great blogger is one who in the midst of the blogosphere, keeps with perfect sweetness the independence of solitude.”

Bottom line — BE YOURSELF — Let your own personality shine through.  Make people feel like they are having a conversation with a real person and not just someone who is blogging what they think people want to read.

I hope that you found at least some of this manifesto helpful.  I also hope to see more of you around the blogosphere.

I'll be the balding blogger who used to almost have a mullet...

 

Related Posts

Collablogaunity

Brevity is the Soul of Social Media

Podcast: Your Blog, Your Voice

Customer Incognita

Many enterprise information initiatives are launched in order to unravel that riddle, wrapped in a mystery, inside an enigma, that great unknown, also known as...Customer.

Centuries ago, cartographers used the Latin phrase terra incognita (meaning “unknown land”) to mark regions on a map not yet fully explored.  In this century, companies simply can not afford to use the phrase customer incognita to indicate what information about their existing (and prospective) customers they don't currently have or don't properly understand.

 

What is a Customer?

First things first, what exactly is a customer?  Those happy people who give you money?  Those angry people who yell at you on the phone or say really mean things about your company on Twitter and Facebook?  Why do they have to be so mean? 

Mean people suck.  However, companies who don't understand their customers also suck.  And surely you don't want to be one of those companies, do you?  I didn't think so.

Getting back to the question, here are some insights from the Data Quality Pro discussion forum topic What is a customer?:

  • Someone who purchases products or services from you.  The word “someone” is key because it’s not the role of a “customer” that forms the real problem, but the precision of the term “someone” that causes challenges when we try to link other and more specific roles to that “someone.”  These other roles could be contract partner, payer, receiver, user, owner, etc.
  • Customer is a role assigned to a legal entity in a complete and precise picture of the real world.  The role is established when the first purchase is accepted from this real-world entity.  Of course, the main challenge is whether or not the company can establish and maintain a complete and precise picture of the real world.

These working definitions were provided by fellow blogger and data quality expert Henrik Liliendahl Sørensen, who recently posted 360° Business Partner View, which further examines the many different ways a real-world entity can be represented, including when, instead of a customer, the real-world entity represents a citizen, patient, member, etc.

A critical first step for your company is to develop your definition of a customer.  Don't underestimate either the importance or the difficulty of this process.  And don't assume it is simply a matter of semantics.

Some of my consulting clients have indignantly told me: “We don't need to define it, everyone in our company knows exactly what a customer is.”  I usually respond: “I have no doubt that everyone in your company uses the word customer, however I will work for free if everyone defines the word customer in exactly the same way.”  So far, I haven't had to work for free.  

 

How Many Customers Do You Have?

You have done the due diligence and developed your definition of a customer.  Excellent!  Nice work.  Your next challenge is determining how many customers you have.  Hopefully, you are not going to try using any of these techniques:

  • SELECT COUNT(*) AS "We have this many customers" FROM Customers
  • SELECT COUNT(DISTINCT Name) AS "No wait, we really have this many customers" FROM Customers
  • Middle-Square or Blum Blum Shub methods (i.e. random number generation)
  • Magic 8-Ball says: “Ask again later”

One of the most common and challenging data quality problems is the identification of duplicate records, especially redundant representations of the same customer information within and across systems throughout the enterprise.  The need for a solution to this specific problem is one of the primary reasons that companies invest in data quality software and services.

Earlier this year on Data Quality Pro, I published a five part series of articles on identifying duplicate customers, which focused on the methodology for defining your business rules and illustrated some of the common data matching challenges.

Topics covered in the series:

  • Why a symbiosis of technology and methodology is necessary when approaching this challenge
  • How performing a preliminary analysis on a representative sample of real data prepares effective examples for discussion
  • Why using a detailed, interrogative analysis of those examples is imperative for defining your business rules
  • How both false negatives and false positives illustrate the highly subjective nature of this problem
  • How to document your business rules for identifying duplicate customers
  • How to set realistic expectations about application development
  • How to foster a collaboration of the business and technical teams throughout the entire project
  • How to consolidate identified duplicates by creating a “best of breed” representative record

To read the series, please follow these links:

To download the associated presentation (no registration required), please follow this link: OCDQ Downloads

 

Conclusion

“Knowing the characteristics of your customers,” stated Jill Dyché and Evan Levy in the opening chapter of their excellent book, Customer Data Integration: Reaching a Single Version of the Truth, “who they are, where they are, how they interact with your company, and how to support them, can shape every aspect of your company's strategy and operations.  In the information age, there are fewer excuses for ignorance.”

For companies of every size and within every industry, customer incognita is a crippling condition that must be replaced with customer cognizance in order for the company to continue to remain competitive in a rapidly changing marketplace.

Do you know your customers?  If not, then they likely aren't your customers anymore.

The Tell-Tale Data

It is a dark and stormy night in the data center.  The constant humming of hard drives is mimicking the sound of a hard rain falling in torrents, except at occasional intervals, when it is checked by a violent gust of conditioned air sweeping through the seemingly endless aisles of empty cubicles, rattling along desktops, fiercely agitating the flickering glow from flat panel monitors that are struggling against the darkness.

Tonight, amid this foreboding gloom with only my thoughts for company, I race to complete the production implementation of the Dystopian Automated Transactional Analysis (DATA) system.  Nervous, very, very dreadfully nervous I have been, and am, but why will you say that I am mad?  Observe how calmly I can tell you the whole story.

Eighteen months ago, I was ordered by executive management to implement the DATA system.  The vendor's salesperson was an oddly charming fellow named Machiavelli, who had the eye of a vulture — a pale blue eye, with a film over it.  Whenever this eye fell upon me, my blood ran cold. 

Machiavelli assured us all that DATA's seamlessly integrated Magic Beans software would migrate and consolidate all of our organization's information, clairvoyantly detecting and correcting our existing data quality problems, and once DATA was implemented into production, Magic Beans would prevent all future data quality problems from happening.

As soon as a source was absorbed into DATA, Magic Beans automatically did us the favor of freeing up disk space by deleting all traces of the source, somehow even including our off-site archives.  DATA would then become our only system of record, truly our Single Version of the Truth.

It is impossible to say when doubt first entered my brain, but once conceived, it haunted me day and night.  Whenever I thought about it, my blood ran cold — as cold as when that vulture eye was gazing upon me — very gradually, I made up my mind to simply load DATA and rid myself of my doubt forever.

Now this is the point where you will fancy me quite mad.  But madmen know nothing.  You should have seen how wisely I proceeded — with what caution — with what foresight — with what Zen-like tranquility, I went to work! 

I was never happier than I was these past eighteen months while I simply followed the vendor's instructions step by step and loaded DATA!  Would a madman have been so wise as this?  I think not.

Tomorrow morning, DATA goes live.  I can imagine how wonderful that will be.  I will be sitting at my desk, grinning wildly, deliriously happy with a job well done.  DATA will be loaded, data quality will trouble me no more.

It is now four o'clock in the morning, but still it is as dark as midnight.  But as bright as the coming dawn, I can now see three strange men as they gather around my desk. 

Apparently, a shriek had been heard from the business analysts and subject matter experts as soon as they started using DATA.  Suspicions had been aroused, complaints had been lodged, and they (now identifying themselves as auditors) had been called in by a regulatory agency to investigate.

I smile — for what have I to fear?  I welcome these fine gentlemen.  I give them a guided tour of DATA using its remarkably intuitive user interface.  I urge them audit — audit well.  They seemed satisfied.  My manner has convinced them.  I am singularly at ease.  They sit, and while I answer cheerily, they chat away about trivial things.  But before long, I feel myself growing pale and wish them gone.

My head aches and I hear a ringing in my ears, but still they sit and chat.  The ringing becomes more distinct.  I talk more freely, to get rid of the feeling, but it continues and gains volume — until I find that this noise is not within my ears.

No doubt I now grow very pale — but I talk more fluently, and with a heightened voice.  Yet the sound increases — and what can I do?  It is a low, dull, quick sound.  I gasp for breath — and yet the auditors hear it not. 

I talk more quickly — more vehemently — but the noise steadily increases.  I arise, and argue about trifles, in a high key and with violent gesticulations — but the noise steadily increases.  Why will they not be gone?  I pace the floor back and forth, with heavy strides, as if excited to fury by the unrelenting observations of the auditors — but the noise steadily increases.

What could I do?  I raved — I ranted — I raged!  I swung my chair and smashed my computer with it — but the noise rises over all of my attempts to silence it.  It grows louder — louder — louder!  And still the auditors chat pleasantly, and smile.  Is it really possible they can not hear it?  Is it really possible they did not notice me smashing my computer?

They hear! — they suspect! — they know! — they are making a mockery of my horror! — this I thought, and this I think.  But anything is better than this agony!  Anything is more tolerable than this derision!  I can not bear their hypocritical smiles any longer!  I feel that I must scream or die! — and now — again! — the noise!  Louder!  Louder!!  LOUDER!!!

 

“DATA!” I finally shriek.  “DATA has no quality!  NO DATA QUALITY!!!  What have I done?  What — Have — I — Done?!?”

 

With a sudden jolt, I awaken at my desk, with my old friend Edgar shaking me by the shoulders. 

“Hey, wake up!  Executive management wants us in the conference room in five minutes.  Apparently, there is a vendor here today pitching a new system called DATA using software called Magic Beans...” 

“...and the salesperson has this weird eye...”

Days Without A Data Quality Issue

In 1970, the United States Department of Labor created the Occupational Safety and Health Administration (OSHA).  The mission of OSHA is to prevent work-related injuries, illnesses, and deaths.  Based on statistics from 2007, since OSHA's inception, occupational deaths in the United States have been cut by 62% and workplace injuries have declined by 42%.

OSHA regularly conducts inspections to determine if organizations are in compliance with safety standards and assesses financial penalties for violations.  In order to both promote workplace safety and avoid penalties, organizations provide their employees with training on the appropriate precautions and procedures to follow in the event of an accident or an emergency.

Training programs certify new employees in safety protocols and indoctrinate them into the culture of a safety-conscious workplace.  By requiring periodic re-certification, all employees maintain awareness of their personal responsibility in both avoiding workplace accidents and responding appropriately to emergencies.

Although there has been some debate about the effectiveness of the regulations and the enforcement policies, over the years OSHA has unquestionably brought about many necessary changes, especially in the area of industrial work site safety where dangerous machinery and hazardous materials are quite common. 

Obviously, even with well-defined safety standards in place, workplace accidents will still occasionally occur.  However, these standards have helped greatly reduce both the frequency and severity of the accidents.  And most importantly, safety has become a natural part of the organization's daily work routine.

 

A Culture of Data Quality

Similar to indoctrinating employees into the culture of a safety-conscious workplace, more and more organizations are realizing the importance of creating and maintaining the culture of a data quality conscious workplace.  A culture of data quality is essential for effective enterprise information management.

Waiting until a serious data quality issue negatively impacts the organization before starting an enterprise data quality program is analogous to waiting until a serious workplace accident occurs before starting a safety program.

Many data quality issues are caused by a lack of data ownership and an absence of clear guidelines indicating who is responsible for ensuring that data is of sufficient quality to meet the daily business needs of the enterprise.  In order for data quality to be taken seriously within your organization, everyone first needs to know that data quality is an enterprise-wide priority.

Additionally, data quality standards must be well-defined, and everyone must accept their personal responsibility in both preventing data quality issues and responding appropriately to mitigate the associated business risks when issues do occur.

 

Data Quality Assessments

The data equivalent of a safety inspection is a data quality assessment, which provides a much needed reality check for the perceptions and assumptions that the enterprise has about the quality of its data. 

Performing a data quality assessment helps with a wide variety of tasks including: verifying data matches the metadata that describes it, preparing meaningful questions for subject matter experts, understanding how data is being used, quantifying the business impacts of poor quality data, and evaluating the ROI of data quality improvements.

An initial assessment provides a baseline and helps establish data quality standards as well as set realistic goals for improvement.  Subsequent data quality assessments, which should be performed on a regular basis, will track your overall progress.

Although preventing data quality issues is your ultimate goal, don't let the pursuit of perfection undermine your efforts.  Always be mindful of the data quality issues that remain unresolved, but let them serve as motivation.  Learn from your mistakes without focusing on your failures – focus instead on making steady progress toward improving your data quality.

 

Data Governance

The data equivalent of verifying compliance with safety standards is data governance, which establishes policies and procedures to align people throughout the organization.  Enterprise data quality programs require a data governance framework in order successfully deploy data quality as an enterprise-wide initiative. 

By facilitating the collaboration of all business and technical stakeholders, aligning data usage with business metrics, enforcing data ownership, and prioritizing data quality, data governance enables effective enterprise information management.

Obviously, even with well-defined and well-managed data governance policies and procedures in place, data quality issues will still occasionally occur.  However, your goal is to greatly reduce both the frequency and severity of your data quality issues. 

And most importantly, the responsibility for ensuring that data is of sufficient quality to meet your daily business needs, has now become a natural part of your organization's daily work routine.

 

Days Without A Data Quality Issue

Organizations commonly display a sign indicating how long they have gone without a workplace accident.  Proving that I certainly did not miss my calling as a graphic designer, I created this “sign” for Days Without A Data Quality Issue:

Days Without A Data Quality Issue

 

Related Posts

Poor Data Quality is a Virus

DQ-Tip: “Don't pass bad data on to the next person...”

The Only Thing Necessary for Poor Data Quality

Hyperactive Data Quality (Second Edition)

Data Governance and Data Quality