OCDQ Blog

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

Commendable Comments (Part 11)

March 13, 2012

Commendable Comments (Part 12)

March 13, 2012/ Jim Harris

Since I officially launched this blog on March 13, 2009, that makes today the Third Blogiversary of OCDQ Blog!

So, absolutely without question, there is no better way to commemorate this milestone other than to also make this the 12th entry in my ongoing series for expressing my gratitude to my readers for their truly commendable comments on my blog posts.

Commendable Comments

On Big Data el Memorioso, Mark Troester commented:

“I think this helps illustrate that one size does not fit all.

You can’t take a singular approach to how you design for big data. It’s all about identifying relevance and understanding that relevance can change over time.

There are certain situations where it makes sense to leverage all of the data, and now with high performance computing capabilities that include in-memory, in-DB and grid, it's possible to build and deploy rich models using all data in a short amount of time. Not only can you leverage rich models, but you can deploy a large number of models that leverage many variables so that you get optimal results.

On the other hand, there are situations where you need to filter out the extraneous information and the more intelligent you can be about identifying the relevant information the better.

The traditional approach is to grab the data, cleanse it, and land it somewhere before processing or analyzing the data. We suggest that you leverage analytics up front to determine what data is relevant as it streams in, with relevance based on your organizational knowledge or context. That helps you determine what data should be acted upon immediately, where it should be stored, etc.

And, of course, there are considerations about using visual analytic techniques to help you determine relevance and guide your analysis, but that’s an entire subject just on its own!”

On Data Governance Frameworks are like Jigsaw Puzzles, Gabriel Marcan commented:

“I agree (and like) the jigsaw puzzles metaphor. I would like to make an observation though:

Can you really construct Data Governance one piece at a time?

I would argue you need to put together sets of pieces simultaneously, and to ensure early value, you might want to piece together the interesting / easy pieces first.

Hold on, that sounds like the typical jigsaw strategy anyway . . . :-)”

On Data Governance Frameworks are like Jigsaw Puzzles, Doug Newdick commented:

“I think that there are a number of more general lessons here.

In particular, the description of the issues with data governance sounds very like the issues with enterprise architecture. In general, there are very few eureka moments in solving the business and IT issues plaguing enterprises. These solutions are usually 10% inspiration, 90% perspiration in my experience. What looks like genius or a sudden breakthrough is usually the result of a lot of hard work.

I also think that there is a wider Myth of the Framework at play too.

The myth is that if we just select the right framework then everything else will fall into place. In reality, the selection of the framework is just the start of the real work that produces the results. Frameworks don’t solve your problems, people solve your problems by the application of brain-power and sweat.

All frameworks do is take care of some of the heavy-lifting, i.e., the mundane foundational research and thinking activity that is not specific to your situation.

Unfortunately the myth of the framework is why many organizations think that choosing TOGAF will immediately solve their IT issues and are then disappointed when this doesn’t happen, when a more sensible approach might have garnered better long-term success.”

On Data Quality: Quo Vadimus?, Richard Jarvis commented:

“I agree with everything you’ve said, but there’s a much uglier truth about data quality that should also be discussed — the business benefit of NOT having a data quality program.

The unfortunate reality is that in a tight market, the last thing many decision makers want to be made public (internally or externally) is the truth.

In a company with data quality principles ingrained in day-to-day processes, and reporting handled independently, it becomes much harder to hide or reinterpret your falling market share. Without these principles though, you’ll probably be able to pick your version of the truth from a stack of half a dozen, then spend your strategy meeting discussing which one is right instead of what you’re going to do about it.

What we’re talking about here is the difference between a Politician — who will smile at the camera and proudly announce 0.1% growth was a fantastic result given X, Y, and Z factors — and a Statistician who will endeavor to describe reality with minimal personal bias.

And the larger the organization, the more internal politics plays a part. I believe a lot of the reluctance in investing in data quality initiatives could be traced back to this fear of being held truly accountable, regardless of it being in the best interests of the organization. To build a data quality-centric culture, the change must be driven from the CEO down if it’s to succeed.”

On Data Quality: Quo Vadimus?, Peter Perera commented:

“The question: ‘Is Data Quality a Journey or a Destination?’ suggests that it is one or the other.

I agree with another comment that data quality is neither . . . or, I suppose, it could be both (the journey is the destination and the destination is the journey. They are one and the same.)

The quality of data (or anything for that matter) is something we experience.

Quality only radiates when someone is in the act of experiencing the data, and usually only when it is someone that matters. This radiation decays over time, ranging from seconds or less to years or more.

The only problem with viewing data quality as radiation is that radiation can be measured by an instrument, but there is no such instrument to measure data quality.

We tend to confuse data qualities (which can be measured) and data quality (which cannot).

In the words of someone whose name I cannot recall: ‘Quality is not job one. Being totally %@^#&$*% amazing is job one.’ The only thing I disagree with here is that being amazing is characterized as a ‘job.’

Data quality is not something we ‘do’ to data. It’s not a business initiative or project or job. It’s not a discipline. We need to distinguish between the pursuit (journey) of being amazing and actually being amazing (destination — but certainly not a final one). To be amazing requires someone to be amazed. We want data to be continuously amazing . . . to someone that matters, i.e., someone who uses and values the data a whole lot for an end that makes a material difference.

Come to think of it, the only prerequisite for data quality is being alive because that is the only way to experience it. If you come across some data and have an amazed reaction to it and can make a difference using it, you cannot help but experience great data quality. So if you are amazing people all the time with your data, then you are doing your data quality job very well.”

On Data Quality and Miracle Exceptions, Gordon Hamilton commented:

“Nicely delineated argument, Jim. Successfully starting a data quality program seems to be a balance between getting started somewhere and determining where best to start. The data quality problem is like a two-edged sword without a handle that is inflicting the ‘death of a thousand cuts’.

Data quality is indeed difficult to get ‘a handle on’.”

And since they generated so much great banter, please check out all of the commendable comments received by the blog posts There is No Such Thing as a Root Cause and You only get a Return from something you actually Invest in.

Thank You for Three Awesome Years

You are Awesome — which is why receiving your comments has been the most rewarding aspect of my blogging experience over the last three years. Even if you have never posted a comment, you are still awesome — feel free to tell everyone I said so.

This entry in the series highlighted commendable comments on blog posts published between December 2011 and March 2012.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please continue commenting and stay tuned for future entries in the series.

Thank you for reading the Obsessive-Compulsive Data Quality blog for the last three years. Your readership is deeply appreciated.

Commendable Comments (Part 10) – The 300th OCDQ Blog Post

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

Commendable Comments (Part 10) – The 300th OCDQ Blog Post

November 21, 2011

Commendable Comments (Part 11)

November 21, 2011/ Jim Harris

This Thursday is Thanksgiving Day, which in the United States is a holiday with a long, varied, and debated history. However, the most consistent themes remain family and friends gathering together to share a large meal and express their gratitude.

This is the eleventh entry in my ongoing series for expressing my gratitude to my readers for their commendable comments on my blog posts. Receiving comments is the most rewarding aspect of my blogging experience because not only do comments greatly improve the quality of my blog, comments also help me better appreciate the difference between what I know and what I only think I know. Which is why, although I am truly grateful to all of my readers, I am most grateful to my commenting readers.

Commendable Comments

On The Stakeholder’s Dilemma, Gwen Thomas commented:

“Recently got to listen in on a ‘cooperate or not’ discussion. (Not my clients.) What struck me was that the people advocating cooperation were big-picture people (from architecture and process) while those who just wanted what they wanted were more concerned about their own short-term gains than about system health. No surprise, right?

But what was interesting was that they were clearly looking after their own careers, and not their silos’ interests. I think we who help focus and frame the Stakeholder’s Dilemma situations need to be better prepared to address the individual people involved, and not just the organizational roles they represent.”

On Data, Information, and Knowledge Management, Frank Harland commented:

“As always, an intriguing post. Especially where you draw a parallel between Data Governance and Knowledge Management (wisdom management?) We sometimes portray data management (current term) as ‘well managed data administration’ (term from 70s-80s). As for the debate on ‘data’ and ‘information’ I prefer to see everything written, drawn and / or stored on paper or in digital format as data with various levels of informational value, depending on the amount and quality of metadata surrounding the data item and the accessibility, usefulness (quality) of that item.

For example, 12024561414 is a number with low informational value. I could add metadata, for instance: ‘Phone number’, that makes it potentially known as a phone number. Rather than let you find out whose number it is we could add more information value and add more metadata like: ‘White House Switchboard’. Accessibility could be enhanced by improving formatting like: (1) 202-456-1414.

What I am trying to say with this example is that data items should be placed on a rising scale of informational value rather than be put on steps or firm levels of informational value. So the Information Hierarchy provided by Professor Larson does not work very well for me. It could work only if for all data items the exact information value was determined for every probable context. This model is useful for communication purposes.”

On Plato’s Data, Peter Perera commented:

“‘erised stra ehru oyt ube cafru oyt on wohsi.’

To all Harry Potter fans this translates to: ‘I show not your face but your heart’s desire.’

It refers to The Mirror of Erised. It does not reflect reality but what you desire. (Erised is Desired spelled backwards.) Often data will cast a reflection of what people want to see.

‘Dumbledore cautions Harry that the mirror gives neither knowledge nor truth and that men have wasted away before it, entranced by what they see.’ How many systems are really Mirrors of Erised?”

On Plato’s Data, Larisa Bedgood commented:

“Because the prisoners in the cave are chained and unable to turn their heads to see what goes on behind them, they perceive the shadows as reality. They perceive imperfect reflections of truth and reality.

Bringing the allegory to modern times, this serves as a good reminder that companies MUST embrace data quality for an accurate and REAL view of customers, business initiatives, prospects, and so on. Continuing to view half-truths based on possibly faulty data and information means you are just lost in a dark cave!

I also like the comparison to the Mirror of Erised. One of my favorite movies is the Matrix, in which there are also a lot of parallelisms to Plato’s Cave Allegory. As Morpheus says to Neo: ‘That you are a slave, Neo. Like everyone else you were born into bondage. Into a prison that you cannot taste or see or touch. A prison for your mind.’ Once Neo escapes the Matrix, he discovers that his whole life was based on shadows of the truth.

Plato, Harry Potter, and Morpheus — I’d love to hear a discussion between the three of them in a cave!”

On Plato’s Data, John Owens commented:

“It is true that data is only a reflection of reality but that is also true of anything that we perceive with our senses. When the prisoners in the cave turn around, what they perceive with their eyes in the visible spectrum is only a very narrow slice of what is actually there. Even the ‘solid’ objects they see, and can indeed touch, are actually composed of 99% empty space.

The questions that need to be asked and answered about the essence of data quality are far less esoteric than many would have us believe. They can be very simple, without being simplistic. Indeed simplicity can be seen as a cornerstone of true data quality. If you cannot identify the underlying simplicity that lies at the heart of data quality you can never achieve it. Simple questions are the most powerful. Questions like, ‘In our world (i.e., the enterprise in question) what is it that we need to know about (for example) a Sale that will enable us to operate successfully and meet all of our goals and objectives?’ If the enterprise cannot answer such simple questions then it is in trouble. Making the questions more complicated will not take the enterprise any closer to where it needs to be. Rather it will completely obscure the goal.

Data quality is rather like a ‘magic trick’ done by a magician. Until you know how it is done it appears to an unfathomable mystery. Once you find out that is merely an illusion, the reality is absolutely simple and, in fact, rather mundane. But perhaps that is why so many practitioners perpetuate the illusion. It is not for self gain. They just don’t want to tell the world that, when it comes to data quality, there is no Tooth Fairy, no Easter Bunny, or no Santa Claus. It’s sad, but true. Data quality is boringly simple!”

On Plato’s Data, Peter Benson commented:

“Actually I would go substantially further, whereas data was originally no more than a representation of the real world and if validation was required the real world was the ‘authoritative source’ — but that is clearly no longer the case. Data is in fact the new reality!

Data is now used to track everything, if the data is wrong the real world item disappears. It may have really been destroyed or it may be simply lost, but it does not matter, if the data does not provide evidence of its existence then it does not exist. If you doubt this, just think of money, how much you have is not based on any physical object but on data.

By the way the theoretical definition I use for data is as follows:

Datum — a disruption in a continuum.

The practical definition I use for data is as follows:

Data — elements into which information is transformed so that it can be stored or moved.”

On Data Governance and the Adjacent Possible, Paul Erb commented:

“We can see that there’s a trench between those who think adjacent means out of scope and those who think it means opportunity. Great leaders know that good stories make for better governance for an organization that needs to adapt and evolve, but stay true to its mission. Built from, but not about, real facts, good fictions are broadly true without being specifically true, and therefore they carry well to adjacent business processes where their truths can be applied to making improvements.

On the other hand, if it weren’t for nonfiction — accounts of real markets and processes — there would be nothing for the POSSIBLE to be adjacent TO. Managers often have trouble with this because they feel called to manage the facts, and call anything else an airy-fairy waste of time.

So a data governance program needs to assert whether its purpose is to fix the status quo only, or to fix the status quo in order to create agility to move into new areas when needed. Each of these should have its own business case and related budgets and thresholds (tolerances) in the project plan. And it needs to choose its sponsorship and data quality players accordingly.”

On You Say Potato and I Say Tater Tot, John O’Gorman commented:

“I’ve been working on a definitive solution for the data / information / metadata / attributes / properties knot for a while now and I think I have it figured out.

I read your blog entitled The Semantic Future of MDM and we share the same philosophy even while we differ a bit on the details. Here goes. It’s all information. Good, bad, reliable or not, the argument whether data is information or vice versa is not helpful. The reason data seems different than information is because it has too much ambiguity when it is out of context. Data is like a quantum wave: it has many possibilities one of which is ‘collapsed’ into reality when you add context. Metadata is not a type of data, any more than attributes, properties or associations are a type of information. These are simply conventions to indicate the role that information is playing in a given circumstance.

Your Michelle Davis example is a good illustration: Without context, that string could be any number of individuals, so I consider it data. Give it a unique identifier and classify it as a digital representation in the class of Person, however and we have information. If I then have Michelle add attributes to her personal record — like sex, age, etc. — and assuming that these are likewise identified and classed — now Michelle is part of a set, or relation. Note that it is bad practice — and consequently the cause of many information management headaches — to use data instead of information. Ambiguity kills. Now, if I were to use Michelle’s name in a Subject Matter Expert field as proof of the validity of a digital asset; or in the Author field as an attribute, her information does not *become* metadata or an attribute: it is still information. It is merely being used differently.

In other words, in my world while the terms ‘data’ and ‘information’ are classified as concepts, the terms ‘metadata’, ‘attribute’ and ‘property’ are classified as roles to which instances of those concepts (well, one of them anyway) can be put, i.e., they are fit for purpose. This separation of the identity and class of the string from the purpose to which it is being assigned has produced very solid results for me.”

Thanks for giving your comments

Thank you very much for giving your comments and sharing your perspectives with our collablogaunity. This entry in the series highlighted commendable comments on OCDQ Blog posts published between July and November of 2011.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please keep on commenting and stay tuned for future entries in the series.

Thank you for reading the Obsessive-Compulsive Data Quality (OCDQ) blog. Your readership is deeply appreciated.

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

July 16, 2011

Commendable Comments (Part 10)

July 16, 2011/ Jim Harris

Welcome to the 300th Obsessive-Compulsive Data Quality (OCDQ) blog post!

You might have been expecting a blog post inspired by the movie 300, but since I already did that with Spartan Data Quality, instead I decided to commemorate this milestone with the 10th entry in my ongoing series for expressing my gratitude to my readers for their truly commendable comments on my blog posts.

Commendable Comments

On DQ-BE: Single Version of the Time, Vish Agashe commented:

“This has been one of my pet peeves for a long time. Shared version of truth or the reference version of truth is so much better, friendly and non-dictative (if such a word exists) than single version of truth.

I truly believe that starting a discussion with Single Version of the Truth with business stakeholders is a nonstarter. There will always be a need for multifaceted view and possibly multiple aspects of the truth.

A very common term/example I have come across is the usage of the term revenue. Unfortunately, there is no single version of revenue across the organizations (and for valid reasons). From Sales Management prospective, they like to look at sales revenue (sales bookings) which is the business on which they are compensated on, financial folks want to look at financial revenue, which is the revenue they capture in the books and marketing possibly wants to look at marketing revenue (sales revenue before the discount) which is the revenue marketing uses to justify their budgets. So if you ever asked questions to a group of people about what revenue of the organization is, you will get three different perspectives. And these three answers will be accurate in the context of three different groups.”

On Data Confabulation in Business Intelligence, Henrik Liliendahl Sørensen commented:

“I think this is going to dominate the data management realm in the coming years. We are not only met with drastically increasing volumes of data, but also increasing velocity and variety of data.

The dilemma is between making good decisions and making fast decisions, whether the decisions based on business intelligence findings should wait for assuring the quality of the data upon which the decisions are made, thus risking the decision being too late. If data quality always could be optimal by being solved at the root we wouldn’t have that dilemma.

The challenge is if we are able to have optimal data all the time when dealing with extreme data, which is data of great variety moving in high velocity and coming in huge volumes.”

On The People Platform, Mark Allen commented:

“I definitely agree and think you are burrowing into the real core of what makes or breaks EDM and MDM type initiatives -- it's the people.

Business models, processes, data, and technology all provide fixed forms of enablement or constraint. And where in the past these dynamics have been very compartmentalized throughout a company's business model and systems architecture, with EDM and MDM involving more integrated functions and shared data, people become more of the x-factor in the equation. This demands the presence of data governance to be the facilitating process that drives the collaborative, cross-functional, and decision making dynamics needed for successful EDM and MDM. Of course, the dilemma is that in a governance model people can still make bad decisions that inhibit people from working effectively.

So in terms of the people platform and data governance, there needs to be the correct focus on what are the right roles and good decisions made that can enable people to interact effectively.”

On Beware the Data Governance Ides of March, Jill Wanless commented:

“Our organization has taken the Hybrid Approach (starting Bottom-Up) and it works well for two reasons: (1) the worker bee rock stars are all aligned and ready to hit the ground running, and (2) the ‘Top’ can sit back and let the ‘aligned’ worker bees get on with it.

Of course, this approach is sometimes (painfully) slow, but with the ground-level rock stars already aligned, there is less resistance implementing the policies, and the Top’s heavy hand is needed much less frequently, but I voted for Hybrid Approach (starting Top-Down) because I have less than stellar patience for the long and scenic route.”

On Data Governance and the Buttered Cat Paradox, Rob Drysdale commented:

“Too many companies get paralyzed thinking about how to do this and implement it. (Along with the overwhelmed feeling that it is too much time/effort/money to fix it.) But I think your poll needs another option to vote on, specifically: ‘Whatever works for the company/culture/organization’ since not all solutions will work for every organization.

In some where it is highly structured, rigid and controlled, there wouldn’t be the freedom at the grass-roots level to start something like this and it might be frowned upon by upper-level management. In other organizations that foster grass-roots things then it could work.

However, no matter which way you can get it started and working, you need to have buy-in and commitment at all levels to keep it going and make it effective.”

On The Data Quality Wager, Gordon Hamilton commented:

“Deming puts a lot of energy into his arguments in 'Out of the Crisis' that the short-term mindset of the executives, and by extension the directors, is a large part of the problem.

Jackanapes, a lovely under-used term, might be a bit strong when the executives are really just doing what they are paid for. In North America we get what the directors measure! In fact, one quandary is that a proactive executive, who invests in data quality is building the long-term value of their company but is also setting it up to be acquired by somebody who recognizes that the 'under the radar' improvements are making the prize valuable.

Deming says on p.100: 'Fear of unfriendly takeover may be the single most important obstacle to constancy of purpose. There is also, besides the unfriendly takeover, the equally devastating leveraged buyout. Either way, the conqueror demands dividends, with vicious consequences on the vanquished.'”

On Got Data Quality?, Graham Rhind commented:

“It always makes me smile when people attempt to put a percentage value on their data quality as though it were something as tangible and measurable as the fat content of your milk.

In order to make such a measurement one would need to know where 100% of the defects lie. If they knew that they would be able to resolve the defects and achieve 100% quality. In reality you cannot and do not know where each defect is and how many there are.

Even though tools such as profilers will tell you, for example, that 95% of your US address records have a valid state added, there is still no way to measure how many of these valid states are applicable to the real world entity on the ground. Mr Smith may be registered in the database to an existing and valid address in the database, but if he moved last week there's a data quality issue that won't be discovered until one attempts to contact him.

The same applies when people say they have removed 95% of duplicates from their data. If they can measure it then they know where the other 5% of duplicates are and they can remove them.

But back to the point: you may not achieve 100% quality. In fact, we know you never will. But aiming for that target means that you're aiming in the right direction. As long as your goal is to get close to perfection and not to achieve it, I don't see the problem.”

On Data Governance Star Wars: Balancing Bureaucracy and Agility, Rob “Darth” Karel commented:

“A curious question to my Rebellious friend OCDQ-Wan, while data governance agility is a wonderful goal, and maybe a great place to start your efforts, is it sustainable?

Your agile Rebellion is like any start-up: decisions must be made quickly, you must do a lot with limited resources, everyone plays multiple roles willingly, and your objective is very targeted and specific. For example, to fire a photon torpedo into a small thermal exhaust port - only 2 meters wide - connected directly to the main reactor of the Death Star. Let's say you 'win' that market objective. What next?

The Rebellion defeats the Galactic Empire, leaving a market leadership vacuum. The Rebellion begins to set up a new form of government to serve all (aka grow existing market and expand into new markets) and must grow larger, with more layers of management, in order to scale. (aka enterprise data governance supporting all LOBs, geographies, and business functions).

At some point this Rebellion becomes a new Bureaucracy - maybe with a different name and legacy, but with similar results. Don't forget, the Galactic Empire started as a mini-rebellion itself spearheaded by the agile Palpatine!”

You Are Awesome

Thank you very much for sharing your perspectives with our collablogaunity. This entry in the series highlighted the commendable comments received on OCDQ Blog posts published between January and June of 2011.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please keep on commenting and stay tuned for future entries in the series.

By the way, even if you have never posted a comment on my blog, you are still awesome — feel free to tell everyone I said so.

Thank you for reading the Obsessive-Compulsive Data Quality (OCDQ) blog. Your readership is deeply appreciated.

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

February 14, 2011

Commendable Comments (Part 9)

February 14, 2011/ Jim Harris

Today is February 14 — Valentine’s Day — the annual celebration of enduring romance, where true love is publicly judged according to your willingness to purchase chocolate, roses, and extremely expensive jewelry, and privately judged in ways that nobody (and please, trust me when I say nobody) wants to see you post on Twitter, Facebook, Flickr, YouTube, or your blog.

This is the ninth entry in my ongoing series for expressing my true love to my readers for their truly commendable comments on my blog posts. Receiving comments is the most rewarding aspect of my blogging experience. Although I love all of my readers, I love my commenting readers most of all.

Commendable Comments

On Data Quality Industry: Problem Solvers or Enablers?, Henrik Liliendahl Sørensen commented:

“I sometimes compare our profession with that of dentists. Dentists are also believed to advocate for good habits around your teeth, but are making money when these good habits aren’t followed.

So when 4 out 5 dentists recommend a certain toothpaste, it is probably no good :-)

Seriously though, I take the amount of money spent on data quality tools as a sign that organizations believe there are issues best solved with technology. Of course these tools aren’t magic.

Data quality tools only solve a certain part of your data and information related challenges. On the other hand, the few problems they do solve may be solved very well and cannot be solved by any other line of products or in any practical way by humans in any quantity or quality.”

On Data Quality Industry: Problem Solvers or Enablers?, Jarrett Goldfedder commented:

“I think that the expectations of clients from their data quality vendors have grown tremendously over the past few years. This is, of course, in line with most everything in the Web 2.0 cloud world that has become point-and-click, on-demand response.

In the olden days of 2002, I remember clients asking for vendors to adjust data only to the point where dashboard statistics could be presented on a clean Java user interface. I have noticed that some clients today want the software to not just run customizable reports, but to extract any form of data from any type of database, to perform advanced ETL and calculations with minimal user effort, and to be easy to use. It’s almost like telling your dentist to fix your crooked teeth with no anesthesia, no braces, no pain, during a single office visit.

Of course, the reality today does not match the expectation, but data quality vendors and architects may need to step up their game to remain cutting edge.”

On Data Quality is not an Act, it is a Habit, Rob Paller commented:

“This immediately reminded me of the practice of Kaizen in the manufacturing industry. The idea being that continued small improvements yield large improvements in productivity when compounded.

For years now, many of the thought leaders have preached that projects from business intelligence to data quality to MDM to data governance, and so on, start small and that by starting small and focused, they will yield larger benefits when all of the small projects are compounded.

But the one thing that I have not seen it tied back to is the successes that were found in the leaders of the various industries that have adopted the Kaizen philosophy.

Data quality practitioners need to recognize that their success lies in the fundamentals of Kaizen: quality, effort, participation, willingness to change, and communication. The fundamentals put people and process before technology. In other words, technology may help eliminate the problem, but it is the people and process that allow that elimination to occur.”

On Data Quality is not an Act, it is a Habit, Dylan Jones commented:

“Subtle but immensely important because implementing a coordinated series of small, easily trained habits can add up to a comprehensive data quality program.

In my first data quality role we identified about ten core habits that everyone on the team should adopt and the results were astounding. No need for big programs, expensive technology, change management and endless communication, just simple, achievable habits that importantly were focused on the workers.

To make habits work they need the WIIFM (What’s In It For Me) factor.”

On Darth Data, Rob Drysdale commented:

“Interesting concept about using data for the wrong purpose. I think that data, if it is the ‘true’ data can be used for any business decision as long as it is interpreted the right way.

One problem is that data may have a margin of error associated with it and this must be understood in order to properly use it to make decisions. Another issue is that the underlying definitions may be different.

For example, an organization may use the term ‘customer’ when it means different things. The marketing department may have a list of ‘customers’ that includes leads and prospects, but the operational department may only call them ‘customers’ when they are generating revenue.

Each department’s data and interpretation of it is correct for their own purpose, but you cannot mix the data or use it in the ‘other’ department to make decisions.

If all the data is correct, the definitions and the rules around capturing it are fully understood, then you should be able to use it to make any business decision.

But when it gets misinterpreted and twisted to suit some business decision that it may not be suited for, then you are crossing over to the Dark Side.”

On Data Governance and the Social Enterprise, Jacqueline Roberts commented:

“My continuous struggle is the chaos of data electronically submitted by many, many sources, different levels of quality and many different formats while maintaining the history of classification, correction, language translation, where used, and a multitude of other ‘data transactions’ to translate this data into usable information for multi-business use and reporting. This is my definition of Master Data Management.

I chuckled at the description of the ‘rigid business processes’ and I added ‘software products’ to the concept, since the software industry must understand the fluidity of the change of data to address the challenges of Master Data Management, Data Governance, and Data Cleansing.”

On Data Governance and the Social Enterprise, Frank Harland commented:

“I read: ‘Collaboration is the key to business success. This essential collaboration has to be based on people, and not on rigid business processes . . .’

And I think: Collaboration is the key to any success. This must have been true since the time man hunted the Mammoth. When collaborating, it went a lot better to catch the bugger.

And I agree that the collaboration has to be based on people, and not on rigid business processes. That is as opposed to based on rigid people, and not on flexible business processes. All the truths are in the adjectives.

I don’t mean to bash, Jim, I think there is a lot of truth here and you point to the exact relationship between collaboration as a requirement and Data Governance as a prerequisite. It’s just me getting a little tired of Gartner saying things of the sort that ‘in order to achieve success, people should work together. . .’

I have a word in mind that starts with ‘du’ and ends with ‘h’ :-)”

On Quality and Governance are Beyond the Data, Milan Kučera commented:

“Quality is a result of people’s work, their responsibility, improvement initiatives, etc. I think it is more about the company culture and its possible regulation by government. It is the most complicated to set-up a ‘new’ (information quality) culture, because of its influence on every single employee. It is about well balanced information value chain and quality processes at every ‘gemba’ where information is created.

Confidence in the information is necessary because we make many decisions based on it. Sometimes we do better or worse then before. We should store/use as much accurate information as possible.

All stewardship or governance frameworks should help companies with the change of its culture, define quality measures (the most important is accuracy), cost of poor quality system (allowing them to monitor impacts of poor quality information), and other necessary things. Only at this moment would we be able to trust corporate information and make decisions.

A small remark on technology only. Data quality technology is a good tool for helping you to analyze ‘technical’ quality of data – patterns, business rules, frequencies, NULL or Not NULL values, etc. Many technology companies narrow information quality into an area of massive cleansing (scrap/rework) activities. They can correct some errors but everything in general leads to a higher validity, but not information accuracy. If cleansing is implemented as a regular part of the ETL processes then the company institutionalizes massive correction, which is only a cost adding activity and I am sure it is not the right place to change data contents – we increase data inconsistency within information systems.

Every quality management system (for example TQM, TIQM, Six Sigma, Kaizen) focuses on improvement at the place where errors occur – gemba. All those systems require: leaders, measures, trained people, and simply – adequate culture.

Technology can be a good assistant (helper), but a bad master.”

On Can Data Quality avoid the Dustbin of History?, Vish Agashe commented:

“In a sense, I would say that the current definitions and approaches of/towards data quality might very well not be able to avoid the Dustbin of History.

In the world of phones and PDAs, quality of information about environments, current fashions/trends, locations and current moods of the customer might be more important than a single view of customer or de-duped customers. The pace at which consumer’s habits are changing, it might be the quality of information about the environment in which the transaction is likely to happen that will be more important than the quality of the post transaction data itself . . . Just a thought.”

On Does your organization have a Calumet Culture?, Garnie Bolling commented:

“So true, so true, so true.

I see this a lot. Great projects or initiatives start off, collaboration is expected across organizations, and there is initial interest, big meetings / events to jump start the Calumet. Now what, when the events no longer happen, funding to fly everyone to the same city to bond, share, explore together dries up.

Here is what we have seen work. After the initial kick off, have small events, focus groups, and let the Calumet grow organically. Sometimes after a big powwow, folks assume others are taking care of the communication / collaboration, but with a small venue, it slowly grows.

Success breeds success and folks want to be part of that, so when the focus group achieves, the growth happens. This cycle is then repeated, hopefully.

While it is important for folks to come together at the kick off to see the big picture, it is the small rolling waves of success that will pick up momentum, and people will want to join the effort to collaborate versus waiting for others to pick up the ball and run.

Thanks for posting, good topic. Now where is my small focus group? :-)”

You Are Awesome

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please keep on commenting and stay tuned for future entries in the series.

By the way, even if you have never posted a comment on my blog, you are still awesome — feel free to tell everyone I said so.

November 23, 2010

Commendable Comments (Part 8)

November 23, 2010/ Jim Harris

This Thursday is Thanksgiving Day, which is a United States holiday with a long and varied history. The most consistent themes remain family and friends gathering together to share a large meal and express their gratitude.

This is the eighth entry in my ongoing series for expressing my gratitude to my readers for their truly commendable comments on my blog posts. Receiving comments is the most rewarding aspect of my blogging experience. Although I am truly grateful to all of my readers, I am most grateful to my commenting readers.

Commendable Comments

On The Data-Decision Symphony, James Standen commented:

“Being a lover of both music and data, it struck all the right notes!

I think the analogy is a very good one—when I think about data as music, I think about a companies business intelligence architecture as being a bit like a very good concert hall, stage, and instruments. All very lovely to listen to music—but without the score itself (the data), there is nothing to play.

And while certainly a real live concert hall is fantastic for enjoying Bach, I’m enjoying some Bach right now on my laptop—and the MUSIC is really the key.

Companies very often focus on building fantastic concert halls (made with all the best and biggest data warehouse appliances, ETL servers, web servers, visualization tools, portals, etc.) but forget that the point was to make that decision—and base it on data from the real world. Focusing on the quality of your data, and on the decision at hand, can often let you make wonderful music—and if your budget or schedule doesn't allow for a concert hall, you might be able to get there regardless.”

On “Some is not a number and soon is not a time”, Dylan Jones commented:

“I used to get incredibly frustrated with the data denial aspect of our profession. Having delivered countless data quality assessments, I’ve never found an organization that did not have pockets of extremely poor data quality, but as you say, at the outset, no-one wants to believe this.

Like you, I’ve seen the natural defense mechanisms. Some managers do fear the fallout and I’ve even had quite senior directors bury our research and quickly cut any further activity when issues have been discovered, fortunately that was an isolated case.

In the majority of cases though I think that many senior figures are genuinely shocked when they see their data quality assessments for the first time. I think the big problem is that because they institutionalize so many scrap and rework processes and people that are common to every organization, the majority of issues are actually hidden.

This is one of the issues I have with the big shock announcements we often see in conference presentations (I’m as guilty as hell for these so call me a hypocrite) where one single error wipes millions off a share price or sends a space craft hurtling into Mars.

Most managers don’t experience this cataclysm, so it’s hard for them to relate to because it implies their data needs to be perfect, they believe that’s unattainable and lose interest.

Far better to use anecdotes like the one cited in this blog to demonstrate how simple improvements can change lives and the bottom line in a limited time span.”

On The Real Data Value is Business Insight, Winston Chen commented:

“Yes, quality is in the eye of the beholder. Data quality metrics must be calculated within the context of a data consumer. This context is missing in most software tools on the market.

Another important metric is what I call the Materiality Metric.

In your example, 50% of customer data is inaccurate. It’d be helpful if we know which 50%. Are they the customers that generate the most revenue and profits, or are they dormant customers? Are they test records that were never purged from the system? We can calculate the materiality metric by aggregating a relevant business metric for those bad records.

For example, 85% of the year-to-date revenue is associated with those 50% bad customer records.

Now we know this is serious!”

On The Real Data Value is Business Insight, James Taylor commented:

“I am constantly amazed at the number of folks I meet who are paralyzed about advanced analytics, saying that ‘we have to fix/clean/integrate all our data before we can do that.’

They don’t know if the data would even be relevant, haven’t considered getting the data from an external source and haven't checked to see if the analytic techniques being considered could handle the bad or incomplete data automatically! Lots of techniques used in data mining were invented when data was hard to come by and very ‘dirty’ so they are actually pretty good at coping. Unless someone thinks about the decision you want to improve, and the analytics they will need to do so, I don’t see how they can say their data is too dirty, too inconsistent to be used.”

On The Business versus IT—Tear down this wall!, Scott Andrews commented:

“Early in my career, I answered a typical job interview question ‘What are your strengths?’ with:

‘I can bring Business and IT together to deliver results.’

My interviewer wryly poo-poo’d my answer with ‘Business and IT work together well already,’ insinuating that such barriers may have existed in the past, but were now long gone. I didn’t get that particular job, but in the years since I have seen this barrier in action (I can attest that my interviewer was wrong).

What is required for Business Intelligence success is to have smart business people and smart IT people working together collaboratively. Too many times one side or the other says ‘that’s not my job’ and enormous potential is left unrealized.”

On The Business versus IT—Tear down this wall!, Jill Wanless commented:

“It amazes me (ok, not really...it makes me cynical and want to rant...) how often Business and IT SAY they are collaborating, but it’s obvious they have varying views and perspectives on what collaboration is and what the expected outcomes should be. Business may think collaboration means working together for a solution, IT may think it means IT does the dirty work so Business doesn’t have to.

Either way, why don’t they just start the whole process by having a (honest and open) chat about expectations and that INCLUDES what collaboration means and how they will work together.

And hopefully, (here’s where I start to rant because OMG it’s Collaboration 101) that includes agreement not to use language such as BUSINESS and IT, but rather start to use language like WE.”

On Delivering Data Happiness, Teresa Cottam commented:

“Just a couple of days ago I had this conversation about the curse of IT in general:

When it works no-one notices or gives credit; it’s only when it’s broken we hear about it.

A typical example is government IT over here in the UK. Some projects have worked well; others have been spectacular failures. Guess which we hear about? We review failure mercilessly but sometimes forget to do the same with success so we can document and repeat the good stuff too!

I find the best case studies are the balanced ones that say: this is what we wanted to do, this is how we did it, these are the benefits. Plus this is what I’d do differently next time (lessons learned).

Maybe in those lessons learned we should also make a big effort to document the positive learnings and not just take these for granted. Yes these do come out in ‘best practices’ but again, best practices never get the profile of disaster stories...

I wonder if much of the gloom is self-fulfilling almost, and therefore quite unhealthy. So we say it’s difficult, the failure rate is high, etc. – commonly known as covering your butt. Then when something goes wrong you can point back to the low expectations you created in the first place.

But maybe, the fact we have low expectations means we don’t go in with the right attitude?

The self-defeating outcome is that many large organizations are fearful of getting to grips with their data problems. So lots of projects we should be doing to improve things are put on hold because of the perceived risk, disruption, cost – things then just get worse making the problem harder to resolve.

Data quality professionals surely don’t want to be seen as effectively undertakers to the doomed project, necessary yes, but not surrounded by the unmistakable smell of death that makes others uncomfortable.

Sure the nature of your work is often to focus on the broken, but quite apart from anything else, isn’t it always better to be cheerful?”

On Why isn’t our data quality worse?, Gordon Hamilton commented:

“They say that sport coaches never teach the negative, or to double the double negative, they never say ‘don’t do that.’ I read somewhere, maybe Daniel Siegel’s stuff, that when the human brain processes the statement ‘don’t do that’ it drops the ‘don’t,’ which leaves it thinking ‘do that.’

Data quality is a complex and multi-splendiforous area with many variables intermingled, but our task as Data Quality Evangelists would be more pleasant if we were helping people rise to the level of the positive expectations, rather than our being codependent in their sinking to the level of the negative expectation.”

DQ-Tip: “There is no such thing as data accuracy...” sparked an excellent debate between Graham Rhind and Peter Benson, who is the Project Leader of ISO 8000, which is the international standards for data quality. Their debate included the differences and interdependencies that exist between data and information, as well as between data quality and information quality.

Thanks for giving your comments

Thank you very much for giving your comments and sharing your perspectives with our collablogaunity.

This entry in the series highlighted commendable comments on OCDQ Blog posts published in August and September of 2010.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please keep on commenting and stay tuned for future entries in the series.

October 11, 2010

Commendable Comments (Part 7)

October 11, 2010/ Jim Harris

Blogging has made the digital version of my world much smaller and allowed my writing to reach a much larger audience than would otherwise be possible. Although I am truly grateful to all of my readers, I am most grateful to my commenting readers.

Since its inception over a year ago, this has been an ongoing series for expressing my gratitude to my readers for their truly commendable comments, which greatly improve the quality of my blog posts.

Commendable Comments

On Do you enjoy writing?, Corinna Martinez commented:

“To be literate, a person of letters, means one must occasionally write letters by hand.

The connection between brain and hand cannot be overlooked as a key component to learning. It is by the very fact that it is labor intensive and requires thought that we are able to learn concepts and care thought into action.

One key feels the same as another and if the keyboard is changed then even the positioning of fingers while typing will have no significance. My bread and butter is computers but all in the name of communications, understanding and resolution of problems plaguing people/organizations.

And yet, I will never be too far into a computer to neglect to write a note or letter to a loved one. While I don’t journal, and some say that writing a blog is like journaling online, I love mixing and matching even searching for the perfect word or turn of phrase.

Although a certain number of simians may recreate something legible on machines, Shakespeare or literature of the level to inspire and move it will not be.

The pen is mightier than the sword—from as earthshaking as the downfall of nations to as simple as my having gotten jobs after handwriting simple thank you notes.

Unfortunately, it may go the way of the sword and be kept in glass cases instead of employed in its noblest and most dangerous task—wielded by masters of mind and purpose.”

On The Prince of Data Governance, Jarrett Goldfedder commented:

“Politics and self-interest are rarely addressed factors in principles of data governance, yet are such a strong component during some high-profile implementations, that data governance truly does need to be treated as an art rather than a science.

Data teams should have principles and policies to follow, but these can be easily overshadowed by decisions made from a few executives promoting their own agendas. Somehow, built into the existing theories of data governance, we should consider how to handle these political influences using some measure of accountability that all team members—stakeholders included—need to have.”

On Jack Bauer and Enforcing Data Governance Policies, Jill Wanless commented:

“Data Governance enforcement is a combination of straightforward and logical activities that when implemented correctly will help you achieve compliance, and ensure the success of your program. I would emphasize that they ALL (Documentation, Communication, Metrics, Remediation, Refinement) need to be part of your overall program, as doing one or a few without the others will lead to increased risk of failure.

My favorite? Tough to choose. The metrics are key, as are the documentation, remediation and refinement. But to me they all depend upon good communications. If you don’t communicate your policies, metrics, risks, issues, challenges, work underway, etc., you will fail! I have seen instances where policies have been established, yet they weren’t followed for the simple fact that people were unaware they existed.”

On Is your data complete and accurate, but useless to your business?, Dylan Jones commented:

“This sparks an episode I had a few years ago with an engineering services company in the UK.

I ran a management workshop showing a lot of the issues we had uncovered. As we were walking through a dashboard of all the findings one of the directors shouted out that the 20% completeness stats for a piece of engineering installation data was wrong, she had received no reports of missing data.

I drilled into the raw data and sure enough we found that 80% of the data was incomplete.

She was furious and demanded that site visits be carried out and engineers should be incentivized (i.e., punished!) in order to maintain this information.

What was interesting is that the data went back many years so I posed the question:

‘Has your decision-making ability been impeded by this lack of information?’

What followed was a lengthy debate, but the outcome was NO, it had little effect on operations or strategic decision making.

The company could have invested considerable amounts of time and money in maintaining this information but the benefits would have been marginal.

One of the most important dimensions to add to any data quality assessment is USEFULNESS, I use that as a weight to reduce the impact of other dimensions. To extend your debate further, data may be hopelessly inaccurate and incomplete, but if it’s of no use, then let’s take it out of the equation.”

On Is your data complete and accurate, but useless to your business?, Gordon Hamilton commented:

“Data Quality dimensions that track a data set’s significance to the Business such as Relevance or Impact could help keep the care and feeding efforts for each data set in ratio to their importance to the Business.

I think you are suggesting that the Business’s strategic/tactical objectives should be used to self-assess and even prune data quality management efforts, in order to keep them aligned with the Business rather than letting them have an independent life of their own.

I wonder if all business activities could use a self-assessment metric built in to their processing so that they can realign to reality. In the low levels of biology this is sometimes referred to as a ‘suicide gene’ that lets a cell decide when it is no longer needed. Suicide is such a strong term though, maybe it could be called an: annual review to realign efforts to organizational goals gene.”

On Is your data complete and accurate, but useless to your business?, Winston Chen commented:

“A particularly nasty problem in data management is that data created for one purpose gets used for another. Often, the people who use the data don't have a choice. It’s the only data available!

And when the same piece of data is used for multiple purposes, it gets even tougher. As you said, completeness and accuracy has a context: the same piece of data could be good for one purpose and useless for another.

A major goal of data governance is to define and enforce policies that aligns how data is created with how data is used. And if conflicts arise—they surely will—there’s a mechanism for resolving them.”

On Data Quality and the Cupertino Effect, Marty Moseley commented:

“I usually separate those out by saying that validity is a binary measurement of whether or not a value is correct or incorrect within a certain context, whereas accuracy is a measurement of the valid value’s ‘correctness’ within the context of the other data surrounding it and/or the processes operating upon it.

So, validity answers the question: ‘Is ZW a valid country code?’ and the answer would (currently) be ‘Yes, on the African continent, or perhaps on planet Earth.’

Accuracy answers the question: ‘Is it 2.5 degrees Celsius today in Redding, California?’

To which the answer would measure several things: is 2.5 degrees Celsius a valid temperature for Redding, CA? (yes it is), is it probable this time of year? (no, it has never been nearly that cold on this date), and are there any weather anomalies noted that might recommend that 2.5C is valid for Redding today? (no, there are not). So even though 2.5C is a valid air temperature, Redding, CA is a valid city and state combination, and 2.5C is valid for Redding in some parts of the year, that temperature has never been seen in Redding on July 15th and therefore it is probably not accurate.

Another ‘accuracy’ use case is one I’ve run into before: Is it accurate that Customer A purchased $15,049.00 in <product> on order 123 on <this date>?

To answer this, you may look at the average order size for this product (in quantity and overall price), the average order sizes from Customer A (in quantity ordered and monetary value), any promotions that offer such pricing deals, etc.

Given that the normal credit card charges for this customer are in the $50.00 to $150.00 range, and that the products ordered are on average $10.00 to $30.00, and that even the best customers normally do not order more than $200, and that there has never been a single order from this type of customer for this amount, then it is highly unlikely that a purchase of this size is accurate.”

On Do you believe in Magic (Quadrants)?, Len Dubois commented:

“I believe Magic Quadrants (MQ) are a tool that clients of Gartner, and any one else that can get their hands on them, use as one data point in their decision making process.

Analytic reports, like any other data point, are as useful or dangerous as the user wants/needs it to be. From a buyer’s perspective, a MQ can be used for lots of things:

1. To validate a market
2. To identify vendors in the marketplace
3. To identify minimum qualifications in terms of features and functionality
4. To identify trends
5. To determine a company’s viability
6. To justify one’s choice of a vendor
7. To justify value of a purchase
8. Worse case scenario: defends one choice of a failed selection
9. To demonstrate business value of a technology

I also believe they use the analysts, Ted and Andy in this instance, as a sounding board to validate what they believe or learned from other data points, i.e. references, white papers, demos, friends, colleagues, etc.

In the final analysis though, I know that clients usually make their selection based on many things, the MQ included. One of the most important decision points is the relationship they have with a vendor or the one they believe they are going to be able to develop with a new vendor—and no MQ is going to tell you that.”

Thank You

Thank you all for your comments. Your feedback is greatly appreciated—and truly is the best part of my blogging experience.

This entry in the series highlighted commendable comments on OCDQ Blog posts published in May, June, and July of 2010.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please keep on commenting and stay tuned for future entries in the series.

April 27, 2010

Commendable Comments (Part 6)

April 27, 2010/ Jim Harris

Last September, and on the exact day of the sixth mensiversary (yes, that’s a real word, look it up) of my blog, I started this series as an ongoing celebration of the truly commendable comments that I regularly receive from my heroes—my readers.

Commendable Comments

On The Circle of Quality, Kelly Lautt commented:

“One of the offerings I provide as a consultant is around data readiness specifically for BI. Sometimes, you have to sneak an initial data quality project into a company tightly connected to a project or initiative with a clear, already accepted (and budgeted) ROI. Once the client sees the value of data quality vis a vis the BI requirements, it is easier to then discuss overall data quality (from multiple perspectives).

And, I have to add, I do feel that massive, cumbersome enterprise DQ programs sometimes lose the plot by blindly ‘improving’ data without any value in sight. I think there has to be a balance between ignoring generalized DQ versus going overboard when there will be a diminishing return at some point.

Always drive effort and investment in any area (including DQ) from expected business value!”

On The Poor Data Quality Jar, Daragh O Brien commented:

“We actually tried to implement something like this with regard to billing data quality issues that created compliance problems. Our aim was to have the cost of fixing the problem borne by the business area which created the issue, with the ‘swear jar’ being the budget pool for remediation projects.

We ran into a few practical problems:

1) Many problems ultimately had multiple areas with responsibility (line-of-business workers bypassing processes, IT historically ‘right-sizing’ scope on projects, business processes and business requirements not necessarily being defined properly resulting in inevitable errors)

2) Politics often prevented us from pushing the evidence we did have too hard as to the weighting of contributions towards any issue.

3) More often than not it was not possible to get hard metrics on which to base a weighting of contribution, and people tended to object to being blamed for a problem that was obviously complex with multiple inputs.

That said, the attempt to do it did help us to:

1) Justify our ‘claims’ that these issues were often complex with multiple stakeholders involved.

2) Get stakeholders to think about the processes end-to-end, including the multiple IT systems that were involved in even the simplest process.

3) Ensure we had human resources assigned to projects because we had metrics to apply to a business case.

4) Start building a focus on prevention of defect rather than just error detection and fix.

We never got around to using electric shocks on anyone. But I’d be lying if I said it wasn’t a temptation.”

On The Poor Data Quality Jar, Julian Schwarzenbach commented:

“As data accuracy issues in some cases will be identified by front line staff, how likely are they going to be to report them? Whilst the electric chair would be a tempting solution for certain data quality transgressions, would it mean that more data quality problems are reported?

This presents a similar issue to that in large companies when they look at their accident reporting statistics and reports of near misses/near hits:

* Does a high number of reported accidents and near hits mean that the company is unsafe, or does it mean that there are high levels of reporting coupled with a supportive, learning culture?

* Does a low number of reported accidents and near hits mean that the company is safe, or does it mean that staff are too scared of repercussions to report anything?

If staff risk a large fine/electric shock for owning up to transgressions, they will not do it and will work hard to hide the evidence, if they can.

In organizational/industrial situations, there are often multiple contributing factors to accidents and data quality problems. To minimize the level of future problems, all contributory causes need to be identified and resolved. To achieve this, staff should not be victimized/blamed in any way and should be encouraged to report issues without fear.”

On The Scarlet DQ, Henrik Liliendahl Sørensen commented:

“When I think about the root causes of many of the data quality issues I have witnessed, the original data entry was actually made in good faith by people trying to make data fit for the immediate purpose of use. Honest, loyal, and hardworking employees striving to get the work done.

Who are the bad guys then? Either it is no one or everyone or probably both.

When I have witnessed data quality problems solved it is most often done by a superhero taking the lead in finding solutions. That superhero has been different kinds of people. Sometimes it is a CEO, sometimes a CFO, sometimes a CRM-manager, sometimes it is anyone else.”

On The Scarlet DQ, Jacqueline Roberts commented:

“I work with engineering data and I find that the users of the data are not the creators of data, so by the time that data quality is questioned the engineering project has been completed, the engineering teams have been disbanded and moved on to other projects for other facilities.

I am sure that if the engineers had to put the spare part components on purchasing contracts for plant maintenance, the engineers would start to understand some of the data quality issues such as incomplete part numbers or descriptions, missing information, etc.”

On The Scarlet DQ, Thorsten Radde commented:

“Is the question of ‘who is to blame’ really that important?

For me, it is more important to ask ‘what needs to be done to improve the situation.’

I don’t think that assigning blame helps much in improving the situation. It is very rare that people cooperate to ‘cover up their mistakes.’ I found it more helpful to point out why the current situation is ‘wrong’ and then brainstorm with people on what can be done about it - which additional conventions are required, what can be checked automatically, if new functionality is needed, etc.

Of course, to be able to do that, you’ve got to have the right people on board that trust each other - and the blame game doesn’t help at all. Maybe you need a ‘blame doll’ that everyone can beat in order to vent their frustrations and then move on to more constructive behavior?”

On Can Enterprise-Class Solutions Ever Deliver ROI?, James Standen commented:

“Fantastic question. I think the short answer of course as always is ‘it depends’.

However, what’s important is exactly WHAT does it depend on. And I think while the vendors of these solutions would like you to believe that it depends on the features and functionality of their various applications, that what it all depends on far more is the way they are installed, and to what degree the business actually uses them.

(Insert buzz words here like: ‘business process alignment’, ‘project ownership’, ‘Business/IT collaboration’)

But if you spend Gazillions on a new ERP, then customize it like crazy to ensure that none of your business processes have to change and none of your siloed departments have to talk to each other (which will cost another gazillion in development and consulting by the way), which will then ensure that ongoing maintenance and configuration is more expensive as well, and will eliminate any ability to use pre-built business intelligence solutions etc., etc. Your ROI is going to be a big, negative number.

Unfortunately, this is often how it’s done. So my first comment in this debate is - If enterprise systems enable real change and optimization in business processes, then they CAN have ROI. But it’s hard. And doesn't happen often enough.”

On Microwavable Data Quality, Dylan Jones commented:

“Totally agree with you that data cleansing has been by far the most polarizing topic featured on our site since the launch. Like you, I agree that data governance is a marathon not a sprint but I do object to a lot of the data cleansing bashing that goes on.

I think that sometimes we should give people who purchase cleansing software far more credit than many of the detractors would be willing to offer. In the vast majority of cases data cleansing does provide a positive ROI and whilst some could argue it creates a cost base within the organization it is still a step in the direction of data quality maturity.

I think this particular debate is going to run and run however so thanks for fanning the flames.”

On The Challenging Gift of Social Media, Crysta Anderson commented:

“This is the biggest mindshift for a lot of people. When we started Social Media, many wanted to build our program based only on the second circle - existing customers. We had to fight hard to prove that the third circle not only existed (we had a hunch it did), but that it was worth our time to pursue. Sure, we can't point to a direct sales ROI, but the value of building a ‘tribe’ that raises the conversation about data quality, MDM, data governance and other topics has been incredible and continues to grow.”

Thank You

Thank you all for your comments. Your feedback is greatly appreciated—and truly is the best part of my blogging experience.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please keep on commenting and stay tuned for future entries in the series.

Follow OCDQ

For more blog posts and commendable comments, subscribe to OCDQ via my RSS feed, my E-mail updates, or Google Reader.

You can also follow OCDQ on Twitter, fan the Facebook page for OCDQ, and connect with me on LinkedIn.

February 04, 2010

Commendable Comments (Part 5)

February 04, 2010/ Jim Harris

Thank You

Photo via Flickr (Creative Commons License) by: toettoet

Welcome to the 100th Obsessive-Compulsive Data Quality (OCDQ) blog post!

Absolutely without question, there is no better way to commemorate this milestone other than to also make this the 5th entry in my ongoing series for expressing my gratitude to my readers for their truly commendable comments on my blog posts.

Commendable Comments

On Will people still read in the future?, Don Frederiksen commented:

“I had an opportunity to study and write about informal learning in the past year and one concept that resonated with me was the notion of Personal Learning Environments (PLE).

In the context of your post, I would regard reading as one element of my PLE, i.e., a method for processing content. One power of the PLE is that you can control your content, process, objectives, and tools.

Your PLE will also vary depending on where you are and even with the type of access you have.

For example, I have just spent the last two days without WiFi. As frustrated as I was, I adapted my PLE based on that scenario. This morning, I'm sitting in McDonald's drinking coffee but wasn't in a place to watch your video. (Thank goodness you posted text.)

Even without my current location as a factor, I don't always watch videos or listen to podcasts because I have less control of the content and/or pace.

In regards to your questions, I like books, I read e-books, online content, occasional video, audio books, and Kindle on the iPhone. Combine these items with TweetDeck, Google Reader, the paper version of the Minneapolis Star Tribune, Amazon, and the Public Library, and you have identified the regular components of my PLE. To me the tools and process will vary based on my situation.

I also recognize that other people will most likely employ different tools and processes. The richness of our environment may suggest a decline in reading, but in the end it all comes down to different strokes for different folks. Everyone motivated to learn can create their own PLE.”

On Shut Your Mouth, Augusto Albeghi (aka Stray__Cat) commented:

“In my opinion, this is a very slippery slope.

This post is true in a world of good-hearted people where everyone wants the best for the team.

In the real world, the consultant is someone to blame for every problem the project encounters, e.g., they shut their mouth, they'll never be able to stand the critique and will be fired soon.

The better situation is to have expressed a clear recommendation, and if the the customer chooses not to follow it, then the consultant is formally shielded from any form of critique.

The consultant is likely to be caught in no man's land between opposing factions of the project, and must be able to take the right side by a clear statement. Some customers ask the consultant what's the best thing to do, in order to blame the consultant instead of themselves if something goes wrong.

However, even given all of this, the advice to listen carefully to the customer is still absolutely the #1 lesson that a consultant must learn.”

On OOBE-DQ, Where Are You?, Jill Wanless (aka Sheezaredhead) commented:

“For us, the whole ‘ease of use’ vs. ‘powerful functionality’ debate was included in the business case for the purchase. We identified the business requirements, included pros and cons of ease of use vs. functionality and made vendor recommendations based on the results of the pros vs. cons vs. requirements.

Also important to note, and included in our business case, was to question if the ease of use requires an intensive effort or costly training program, especially if your goal is to engage business users.

So to answer your questions, I would say if you have your requirements identified, and you do your homework on the benefits/risks/costs of the software, you should have all the information you need to make a logical decision based on the present situation.

Which, of course, will change somewhere down the line as everything does.

And for goodness sake (did I say goodness?), when things do change, always start with identifying the requirements. Never assume the requirements are the same as they used to be.”

On OOBE-DQ, Where Are You?, Dylan Jones commented:

“I think the most important trend in recent years is where vendors are really starting to understand how data quality workflows should integrate with the knowledge workforce.

I'm seeing several products really get this and create simple user interfaces and functions based on the role of the knowledge worker involved. These tools have a great balance between usable interface for business specific roles but also a great deal of power features under the bonnet. That is the software I typically recommend but again it is also about budget, these solutions may be too expensive for some organizations.

There is a danger here though of adding powerful features to knowledge workers who don't fully understand the ramifications of committing those updates to that master customer list. I still think we'll see IT playing a major role in the data quality process for some time to come, despite the business-focused marketing we're seeing in vendor land.”

On The Dumb and Dumber Guide to Data Quality, Monis Iqbal commented:

“Pretty convincing post for those allergic to long term corrective measures.

And this spawns another question and that is directed towards software developers who come and work on a product/project involving data manipulation and maintaining the quality of the data but aren't that concerned because they did their job of developing the product and then move on to another assignment.

I know I may be repeating the same arguments as presented in your post (Business vs IT) but these developers did care that the project handles data correctly and yet they aren't concerned about quality in the long term, however the person running the business is.

My point is that although data quality can only be achieved when both parties join hands together, I think it is the stakeholder who has to enforce it during all stages of the project lifecycle.”

Thank You

In this brief OCDQ Video, I express my gratitude to all of my readers for helping me reach my 100th blog post.

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: OCDQ Video

Thanks for your comment

Blogging has made the digital version of my world much smaller and allowed my writing to reach parts of the world it wouldn’t otherwise have been able to reach. My native language is English, which is also the only language I am fluent in.

However, with lots of help from both my readers as well as Google Translate, I have been trying to at least learn how to write “Thanks for your comment” in as many languages as possible.

Here is the list (in alphabetical order by language) that I have compiled so far:

Afrikaans – Dankie vir jou kommentaar
Croatian – Hvala na komentaru
Danish – Tak for din kommentar
Dutch – Bedankt voor je opmerking
French – Merci pour votre commentaire
German – Danke für Deine Anmerkung
Italian – Grazie per il tuo commento
Norwegian – Takk for din kommentar
Portuguese – Obrigado pelo seu comentário
Spanish – Gracias por tu comentario
Swedish – Tack för din kommentar
Welsh – Diolch yn fawr am eich sylw chi

Please help continue my education by adding to (or correcting) the above list by posting a comment below.

November 26, 2009

Commendable Comments (Part 4)

November 26, 2009/ Jim Harris

Photo via Flickr (Creative Commons License) by: ella_marie

Today is Thanksgiving Day, which is a United States holiday with a long and varied history. The most consistent themes remain family and friends gathering together to share a large meal and express their gratitude.

This is the fourth entry in my ongoing series for expressing my gratitude to my readers for their truly commendable comments on my blog posts. Receiving comments is the most rewarding aspect of my blogging experience. Although I am truly grateful to all of my readers, I am most grateful to my commenting readers.

Commendable Comments

On Days Without A Data Quality Issue, Steve Sarsfield commented:

“Data quality issues probably occur on some scale in most companies every day. As long as you qualify what is and isn't a data quality issue, this gets back to what the company thinks is an acceptable level of data quality.

I've always advocated aggregating data quality scores to form business metrics. For example, what data quality metrics would you combine to ensure that customers can always be contacted in case of an upgrade, recall or new product offering? If you track the aggregation, it gives you more of a business feel.”

On Customer Incognita, Daragh O Brien commented:

“Back when I was with the phone company I was (by default) the guardian of the definition of a 'Customer'. Basically I think they asked for volunteers to step forward and I was busy tying my shoelace when the other 11,000 people in the company as one entity took a large step backwards.

I found that the best way to get a definition of a customer was to lock the relevant stakeholders in a room and keep asking 'What' and 'Why'.

My 'data modeling' methodology was simple. Find out what the things were that were important to the business operation, define each thing in English without a reference to itself, and then we played the 'Yes/No Game Show' to figure out how that entity linked to other things and what the attributes of that thing were.

Much to IT's confusion, I insisted that the definition needed to be a living thing, not carved in two stone tablets we'd lug down from on top of the mountain.

However, because of the approach that had been taken we found that when new requirements were raised (27 from one stakeholder), the model accommodated all of them either through an expansion of a description or the addition of a piece of reference data to part of the model.

Fast-forward a few months from the modeling exercise. I was asked by IT to demo the model to a newly acquired subsidiary. It was a significantly different business. I played the 'Yes/No Game Show' with them for a day. The model fitted their needs with just a minor tweak.

The IT team from the subsidiary wanted to know how had I gone about normalizing the data to come up with the model, which is kind of like cutting up a perfectly good apple pie to find out how what an apple is and how to make pastry.

What I found about the 'Yes/No Game Show' approach was that it made people open up their thinking a bit, but it took some discipline and perseverance on my part to keep asking what and why. Luckily, having spent most of the previous few years trying to get these people to think seriously about data quality they already thought I was a moron so they were accommodating to me.

A key learning for me out of the whole thing is that, even if you are doing a data management exercise for a part of a larger business, you need to approach it in a way that can be evolved and continuously improved to ensure quality across the entire organization.

Also, it highlighted the fallacy of assuming that a company can only have one kind of customer.”

On The Once and Future Data Quality Expert, Dylan Jones commented:

“I recently attended a conference and sat in on a panel that discussed some of the future trends, such as cloud computing. It was a great discussion, highly polarized, and as I came home I thought about how far we've come as a profession but more importantly, how much more there is to do.

The reality is that the world is changing, the volumes of data held by businesses are immense and growing exponentially, our desire for new forms of information delivery insatiable, and the opportunities for innovation boundless.

I really believe we're not innovating as an industry anything like we should be. The cloud, as an example, offers massive opportunities for a range of data quality services but I've certainly not read anything in the media or press that indicates someone is capitalizing on this.

There are a few recent data quality technology innovations which have caught my eye, but I also think there is so much more vendors should be doing.

On the personal side of the profession, I think online education is where we're headed. The concept of localized training is now being replaced by online learning. With the Internet you can now train people on every continent, so why aren't more people going down this route?

I find it incredibly ironic when I speak to data quality specialists who admit that 'they don't have the first clue about all this social media stuff.' This is the next generation of information management, it's here right now, they should be embracing it. I think if you're a 'guru' author, trainer or consultant you need to think of new ways to engage with your clients/trainees using the tools available.

What worries me is that the growth of information doesn't match the maturity and growth of our profession. For example, we really need more people who can articulate the value of what we can offer.

Ted Friedman made a great point on Twitter recently when he talked about how people should stop moaning about executives that 'don't get it' and instead focus on improving ways to demonstrate the value of data quality improvement.

Just because we've come a long way doesn't mean we know it all, there is still a hell of a long way to go.”

Thanks for giving your comments

Thank you very much for giving your comments and sharing your perspectives with our collablogaunity. Since there have been so many commendable comments, please don't be offended if your commendable comment hasn't been featured yet.

Please keep on commenting and stay tuned for future entries in the series.

October 16, 2009

Commendable Comments (Part 3)

October 16, 2009/ Jim Harris

In a July 2008 blog post on Men with Pens (one of the Top 10 Blogs for Writers 2009), James Chartrand explained:

“Comment sections are communities strengthened by people.”

“Building a blog community creates a festival of people” where everyone can, as Chartrand explained, “speak up with great care and attention, sharing thoughts and views while openly accepting differing opinions.”

I agree with James (and not just because of his cool first name) – my goal for this blog is to foster an environment in which a diversity of viewpoints is freely shared without bias. Everyone is invited to get involved in the discussion and have an opportunity to hear what others have to offer. This blog's comment section has become a community strengthened by your contributions.

This is the third entry in my ongoing series celebrating my heroes – my readers.

Commendable Comments

On The Fragility of Knowledge, Andy Lunn commented:

“In my field of Software Development, you simply cannot rest and rely on what you know. The technology you master today will almost certainly evolve over time and this can catch you out. There's no point being an expert in something no one wants any more! This is not always the case, but don't forget to come up for air and look around for what's changing.

I've lost count of the number of organizations I've seen who have stuck with a technology that was fresh 15 years ago and a huge stagnant pot of data, who are now scrambling to come up to speed with what their customers expect. Throwing endless piles of cash at the problem, hoping to catch up.

What am I getting at? The secret I've learned is to adapt. This doesn't mean jump on every new fad immediately, but be aware of it. Follow what's trending, where the collective thinking is heading and most importantly, what do your customers want?

I just wish more organizations would think like this and realize that the systems they create, the data they hold, and the customers they have are in a constant state of flux. They are all projects that need care and attention. All subject to change, there's no getting away from it, but small, well planned changes are a lot less painful, trust me.”

On DQ-Tip: “Data quality is primarily about context not accuracy...”, Stephen Simmonds commented:

“I have to agree with Rick about data quality being in the eye of the beholder – and with Henrik on the several dimensions of quality.

A theme I often return to is 'what does the business want/expect from data?' – and when you hear them talk about quality, it's not just an issue of accuracy. The business stakeholder cares – more than many seem to notice – about a number of other issues that are squarely BI concerns:

– Timeliness ('WHEN I want it')
– Format ('how I want to SEE it') – visualization, delivery channels
– Usability ('how I want to then make USE of it') – being able to extract information from a report (say) for other purposes
– Relevance ('I want HIGHLIGHTED the information that is meaningful to me')

And so on. Yes, accuracy is important, and it messes up your effectiveness when delivering inaccurate information. But that's not the only thing a business stakeholder can raise when discussing issues of quality. A report can be rejected as poor quality if it doesn't adequately meet business needs in a far more general sense. That is the constant challenge for a BI professional.”

On Mistake Driven Learning, Ken O'Connor commented:

“There is a Chinese proverb that says:

'Tell me and I'll forget; Show me and I may remember; Involve me and I'll understand.'

I have found the above to be very true, especially when seeking to brief a large team on a new policy or process. Interaction with the audience generates involvement and a better understanding.

The challenge facing books, whitepapers, blog posts etc. is that they usually 'Tell us,' they often 'Show us,' but they seldom 'Involve us.'

Hence, we struggle to remember, and struggle even more to understand. We learn best by 'doing' and by making mistakes.”

You Are Awesome

Thank you very much for your comments. For me, the best part of blogging is the dialogue and discussion provided by interactions with my readers. Since there have been so many commendable comments, please don't be offended if your commendable comment hasn't been featured yet. Please keep on commenting and stay tuned for future entries in the series.

By the way, even if you have never posted a comment on my blog, you are still awesome — feel free to tell everyone I said so.

September 19, 2009

Commendable Comments (Part 2)

September 19, 2009/ Jim Harris

In a recent guest post on ProBlogger, Josh Hanagarne “quoted” Jane Austen:

“It is a truth universally acknowledged, that a blogger in possession of a good domain must be in want of some worthwhile comments.”

“The most rewarding thing has been that comments,” explained Hanagarne, “led to me meeting some great people I possibly never would have known otherwise.” I wholeheartedly echo that sentiment.

This is the second entry in my ongoing series celebrating my heroes – my readers.

Commendable Comments

Proving that comments are the best part of blogging, on The Data-Information Continuum, Diane Neville commented:

“This article is intriguing. I would add more still.

A most significant quote: 'Data could be considered a constant while Information is a variable that redefines data for each specific use.'

This tells us that Information draws from a snapshot of a Data store. I would state further that the very Information [specification] is – in itself – a snapshot.

The earlier quote continues: 'Data is not truly a constant since it is constantly changing.'

Similarly, it is a business reality that 'Information is not truly a constant since it is constantly changing.'

The article points out that 'The Data-Information Continuum' implies a many-to-many relationship between the two. This is a sensible CONCEPTUAL model.

Enterprise Architecture is concerned as well with its responsibility for application quality in service to each Business Unit/Initiative.

For example, in the interest of quality design in Application Architecture, an additional LOGICAL model must be maintained between a then-current Information requirement and the particular Data (snapshots) from which it draws. [Snapshot: generally understood as captured and frozen – and uneditable – at a particular point in time.] Simply put, Information Snapshots have a PARENT RELATIONSHIP to the Data Snapshots from which they draw.

Analyzing this further, refer to this further piece of quoted wisdom (from section 'Subjective Information Quality'): '...business units and initiatives must begin defining their Information...by using...Data...as a foundation...necessary for the day-to-day operation of each business unit and initiative.'

From logically-related snapshots of Information to the Data from which it draws, we can see from this quote that yet another PARENT/CHILD relationship exists...that from Business Unit/Initiative Snapshots to the Information Snapshots that implement whatever goals are the order of the day. But days change.

If it is true that 'Data is not truly a constant since it is constantly changing,' and if we can agree that Information is not truly a constant either, then we can agree to take a rational and profitable leap to the truth that neither is a Business Unit/Initiative...since these undergo change as well, though they represent more slowly-changing dimensions.

Enterprises have an increasing responsibility for regulatory/compliance/archival systems that will qualitatively reproduce the ENTIRE snapshot of a particular operational transaction at any given point in time.

Thus, the Enterprise Architecture function has before it a daunting task: to devise a holistic process that can SEAMLESSLY model the correct relationship of snapshots between Data (grandchild), Information (parent) and Business Unit/Initiative (grandparent).

There need be no conversion programs or redundant, throw-away data structures contrived to bridge the present gap. The ability to capture the activities resulting from the undeniable point-in-time hierarchy among these entities is where tremendous opportunities lie.”

On Missed It By That Much, Vish Agashe commented:

“My favorite quote is 'Instead of focusing on the exceptions – focus on the improvements.'

I think that it is really important to define incremental goals for data quality projects and track the progress through percentage improvement over a period of time.

I think it is also important to manage the expectations that the goal is not necessarily to reach 100% (which will be extremely difficult if not impossible) clean data but the goal is to make progress to a point where the purpose for cleaning the data can be achieved in much better way than had the original data been used.

For example, if marketing wanted to use the contact data to create a campaign for those contacts which have a certain ERP system installed on-site. But if the ERP information on the contact database is not clean (it is free text, in some cases it is absent etc...) then any campaign run on this data will reach only X% contacts at best (assuming only X% of contacts have ERP which is clean)...if the data quality project is undertaken to clean this data, one needs to look at progress in terms of % improvement. How many contacts now have their ERP field cleaned and legible compared to when we started etc...and a reasonable goal needs to be set based on how much marketing and IT is willing to invest in these issues (which in turn could be based on ROI of the campaign based on increased outreach).”

Proving that my readers are way smarter than I am, on The General Theory of Data Quality, John O'Gorman commented:

“My theory of the data, information, knowledge continuum is more closely related to the element, compound, protein, structure arc.

In my world, there is no such thing as 'bad' data, just as there is no 'bad' elements. Data is either useful or not: the larger the audience that agrees that a string is representative of something they can use, the more that string will be of value to me.

By dint of its existence in the world of human communication and in keeping with my theory, I can assign every piece of data to one of a fixed number of classes, each with characteristics of their own, just like elements in the periodic table. And, just like the periodic table, those characteristics do not change. The same 109 usable elements in the periodic table are found and are consistent throughout the universe, and our ability to understand that universe is based on that stability.

Information is simply data in a given context, like a molecule of carbon in flour. The carbon retains all of its characteristics but the combination with other elements allows it to partake in a whole class of organic behavior. This is similar to the word 'practical' occurring in a sentence: Jim is a practical person or the letter 'p' in the last two words.

Where the analogue bends a bit is a cause of a lot of information management pain, but can be rectified with a slight change in perspective. Computers (and almost all indexes) have a hard time with homographs: strings that are identical but that mean different things. By creating fixed and persistent categories of data, my model suffers no such pain.

Take the word 'flies' in the following: 'Time flies like an arrow.' and 'Fruit flies like a pear.' The data 'flies' can be permanently assigned to two different places, and their use determines which instance is relevant in the context of the sentence. One instance is a verb, the other a plural noun.

Knowledge, in my opinion, is the ability to recognize, predict and synthesize patterns of information for past, present and future use, and more importantly to effectively communicate those patterns in one or more contexts to one or more audiences.

On one level, the model for information management that I use makes no apparent distinction between the data: we all use nouns, adjectives, verbs and sometimes scalar objects to communicate. We may compress those into extremely compact concepts but they can all be unraveled to get at elemental components. At another level every distinction is made to insure precision.

The difference between information and knowledge is experiential and since experience is an accumulative construct, knowledge can be layered to appeal to common knowledge, special knowledge and unique knowledge.

Common being the most easily taught and widely applied; Special being related to one or more disciplines and/or special functions; and, Unique to individuals who have their own elevated understanding of the world and so have a need for compact and purpose-built semantic structures.

Going back to the analogue, knowledge is equivalent to the creation by certain proteins of cartilage, the use to which that cartilage is put throughout a body, and the specific shape of the cartilage that forms my nose as unique from the one on my wife's face.

To me, the most important part of the model is at the element level. If I can convince a group of people to use a fixed set of elemental categories and to reference those categories when they create information, it's amazing how much tension disappears in the design, creation and deployment of knowledge.”

Tá mé buíoch díot

Daragh O Brien recently taught me the Irish Gaelic phrase Tá mé buíoch díot, which translates as I am grateful to you.

I am very grateful to all of my readers. Since there have been so many commendable comments, please don't be offended if your commendable comment hasn't been featured yet. Please keep on commenting and stay tuned for future entries in the series.

September 13, 2009

Commendable Comments (Part 1)

September 13, 2009/ Jim Harris

Six month ago today, I launched this blog by asking: Do you have obsessive-compulsive data quality (OCDQ)?

As of September 10, here are the monthly traffic statistics provided by my blog platform:

It Takes a Village (Idiot)

In my recent Data Quality Pro article Blogging about Data Quality, I explained why I started this blog. Blogging provides me a way to demonstrate my expertise. It is one thing for me to describe myself as an expert and another to back up that claim by allowing you to read my thoughts and decide for yourself.

In general, I have always enjoyed sharing my experiences and insights. A great aspect to doing this via a blog (as opposed to only via whitepapers and presentations) is the dialogue and discussion provided via comments from my readers.

This two-way conversation not only greatly improves the quality of the blog content, but much more importantly, it helps me better appreciate the difference between what I know and what I only think I know.

Even an expert's opinions are biased by the practical limits of their personal experience. Having spent most of my career working with what is now mostly IBM technology, I sometimes have to pause and consider if some of that yummy Big Blue Kool-Aid is still swirling around in my head (since I “think with my gut,” I have to “drink with my head”).

Don't get me wrong – “You're my boy, Blue!” – but there are many other vendors and all of them also offer viable solutions driven by impressive technologies and proven methodologies.

Data quality isn't exactly the most exciting subject for a blog. Data quality is not just a niche – if technology blogging was a Matryoshka (a.k.a. Russian nested) doll, then data quality would be the last, innermost doll.

This doesn't mean that data quality isn't an important subject – it just means that you will not see a blog post about data quality hitting the front page of Digg anytime soon.

All blogging is more art than science. My personal blogging style can perhaps best be described as mullet blogging – not “business in the front, party in the back” but “take your subject seriously, but still have a sense of humor about it.”

My blog uses a lot of metaphors and analogies (and sometimes just plain silliness) to try to make an important (but dull) subject more interesting. Sometimes it works and sometimes it sucks. However, I have never been afraid to look like an idiot. After all, idiots are important members of society – they make everyone else look smart by comparison.

Therefore, I view my blog as a Data Quality Village. And as the Blogger-in-Chief, I am the Village Idiot.

The Rich Stuff of Comments

Earlier this year in an excellent IT Business Edge article by Ann All, David Churbuck of Lenovo explained:

“You can host focus groups at great expense, you can run online surveys, you can do a lot of polling, but you won’t get the kind of rich stuff (you will get from blog comments).”

How very true. But before we get to the rich stuff of our village, let's first take a look at a few more numbers:

Not counting this one, I have published 44 posts on this blog
Those blog posts have collectively received a total of 185 comments
Only 5 blog posts received no comments
30 comments were actually me responding to my readers
45 comments were from LinkedIn groups (23), SmartData Collective re-posts (17), or Twitter re-tweets (5)

The ten blog posts receiving the most comments:

The Two Headed Monster of Data Matching – 11 Comments
Adventures in Data Profiling (Part 4) – 9 Comments
Adventures in Data Profiling (Part 2) – 9 Comments
You're So Vain, You Probably Think Data Quality Is About You – 8 Comments
There are no Magic Beans for Data Quality – 8 Comments
The General Theory of Data Quality – 8 Comments
Adventures in Data Profiling (Part 1) – 8 Comments
To Parse or Not To Parse – 7 Comments
The Wisdom of Failure – 7 Comments
The Nine Circles of Data Quality Hell – 7 Comments

Commendable Comments

This post will be the first in an ongoing series celebrating my heroes – my readers.

As Darren Rowse and Chris Garrett explained in their highly recommended ProBlogger book: “even the most popular blogs tend to attract only about a 1 percent commenting rate.”

Therefore, I am completely in awe of my blog's current 88 percent commenting rate. Sure, I get my fair share of the simple and straightforward comments like “Great post!” or “You're an idiot!” – but I decided to start this series because I am consistently amazed by the truly commendable comments that I regularly receive.

On The Data Quality Goldilocks Zone, Daragh O Brien commented:

“To take (or stretch) your analogy a little further, it is also important to remember that quality is ultimately defined by the consumers of the information. For example, if you were working on a customer data set (or 'porridge' in Goldilocks terms) you might get it to a point where Marketing thinks it is 'just right' but your Compliance and Risk management people might think it is too hot and your Field Sales people might think it is too cold. Declaring 'Mission Accomplished' when you have addressed the needs of just one stakeholder in the information can often be premature.

Also, one of the key learnings that we've captured in the IAIDQ over the past 5 years from meeting with practitioners and hosting our webinars is that, just like any Change Management effort, information quality change requires you to break the challenge into smaller deliverables so that you get regular delivery of 'just right' porridge to the various stakeholders rather than boiling the whole thing up together and leaving everyone with a bad taste in their mouths. It also means you can more quickly see when you've reached the Goldilocks zone.”

On Data Quality Whitepapers are Worthless, Henrik Liliendahl Sørensen commented:

“Bashing in blogging must be carefully balanced.

As we all tend to find many things from gurus to tools in our own country, I have also found one of my favourite sayings from Søren Kirkegaard:

If One Is Truly to Succeed in Leading a Person to a Specific Place, One Must First and Foremost Take Care to Find Him Where He is and Begin There.

This is the secret in the entire art of helping.

Anyone who cannot do this is himself under a delusion if he thinks he is able to help someone else. In order truly to help someone else, I must understand more than he–but certainly first and foremost understand what he understands.

If I do not do that, then my greater understanding does not help him at all. If I nevertheless want to assert my greater understanding, then it is because I am vain or proud, then basically instead of benefiting him I really want to be admired by him.

But all true helping begins with a humbling.

The helper must first humble himself under the person he wants to help and thereby understand that to help is not to dominate but to serve, that to help is not to be the most dominating but the most patient, that to help is a willingness for the time being to put up with being in the wrong and not understanding what the other understands.”

On All I Really Need To Know About Data Quality I Learned In Kindergarten, Daniel Gent commented:

“In kindergarten we played 'Simon Says...'

I compare it as a way of following the requirements or business rules.

Simon says raise your hands.

Simon says touch your nose.

Touch your feet.

With that final statement you learned very quickly in kindergarten that you can be out of the game if you are not paying attention to what is being said.

Just like in data quality, to have good accurate data and to keep the business functioning properly you need to pay attention to what is being said, what the business rules are.

So when Simon says touch your nose, don't be touching your toes, and you'll stay in the game.”

Since there have been so many commendable comments, I could only list a few of them in the series debut. Therefore, please don't be offended if your commendable comment didn't get featured in this post. Please keep on commenting and stay tuned for future entries in the series.

Because of You

As Brian Clark of Copyblogger explains, The Two Most Important Words in Blogging are “You” and “Because.”

I wholeheartedly agree, but prefer to paraphrase it as: Blogging is “because of you.”

Not you meaning me, the blogger – you meaning you, the reader.

Thank You.