Solvency II and Data Quality

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, Ken O’Connor and I discuss the Solvency II standards for data quality, and how its European insurance regulatory requirement of “complete, appropriate, and accurate” data represents common sense standards for all businesses.

Ken O’Connor is an independent data consultant with over 30 years of hands-on experience in the field, specializing in helping organizations meet the data quality management challenges presented by data-intensive programs such as data conversions, data migrations, data population, and regulatory compliance such as Solvency II, Basel II / III, Anti-Money Laundering, the Foreign Account Tax Compliance Act (FATCA), and the Dodd–Frank Wall Street Reform and Consumer Protection Act.

Ken O’Connor also provides practical data quality and data governance advice on his popular blog at: kenoconnordata.com

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Pitching Perfect Data Quality

In my previous post, I used a baseball metaphor to explain why we should strive for a quality start to our business activities by starting them off with good data quality, thereby giving our organization a better chance to succeed.

Since it’s a beautiful week for baseball metaphors, let’s post two!  (My apologies to Ernie Banks.)

If good data quality gives our organization a better chance to succeed, then it seems logical to assume that perfect data quality would give our organization the best chance to succeed.  However, as Yogi Berra said: “If the world were perfect, it wouldn’t be.”

My previous baseball metaphor was based on a statistic that measured how well a starting pitcher performs during a game.  The best possible performance of a starting pitcher is called a perfect game, when nine innings are perfectly completed by retiring the minimum of 27 opposing batters without allowing any hits, walks, hit batsmen, or batters reaching base due to a fielding error.

Although a lot of buzz is generated when a pitcher gets close to pitching a perfect game (e.g., usually after five perfect innings, it’s all the game’s announcers will talk about), during the 143 years of Major League Baseball history, during which approximately 200,000 games have been played, there have been only 20 perfect games, making it one of the rarest statistical events in baseball.

When a pitcher loses the chance of pitching a perfect game, does his team forfeit the game?  No, of course not.  Because the pitcher’s goal is not pitching perfectly.  The pitcher’s (and every other player’s) goal is helping the team win the game.

This is why I have never been a fan of anyone who is pitching perfect data quality, i.e., anyone advocating data perfection as the organization’s goal.  The organization’s goal is business success.  Data quality has a role to play, but claiming business success is impossible without having perfect data quality is like claiming winning in baseball is impossible without pitching a perfect game.

 

Related Posts

DQ-View: Baseball and Data Quality

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

Data Quality and The Middle Way

There is No Such Thing as a Root Cause

OCDQ Radio - The Johari Window of Data Quality

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

Quality Starts and Data Quality

This past week was the beginning of the 2012 Major League Baseball (MLB) season.  Since its data is mostly transaction data describing the statistical events of games played, baseball has long been a sport obsessed with statistics.  Baseball statisticians slice and dice every aspect of past games attempting to discover trends that could predict what is likely to happen in future games.

There are too many variables involved in determining which team will win a particular game to be able to choose a single variable that predicts game results.  But a few key statistics are cited by baseball analysts as general guidelines of a team’s potential to win.

One such statistic is a quality start, which is defined as a game in which a team’s starting pitcher completes at least six innings and permits no more than three earned runs.  Of course, a so-called quality start is no guarantee that the starting pitcher’s team will win the game.  But the relative reliability of the statistic to predict a game’s result causes some baseball analysts to refer to a loss suffered by a pitcher in a quality start as a tough loss and a win earned by a pitcher in a non-quality start as a cheap win.

There are too many variables involved in determining if a particular business activity will succeed to be able to choose a single variable that predicts business results.  But data quality is one of the general guidelines of an organization’s potential to succeed.

As Henrik Liliendahl Sørensen blogged, organizations are capable of achieving success with their business activities despite bad data quality, which we could call the business equivalent of cheap wins.  And organizations are also capable of suffering failure with their business activities despite good data quality, which we could call the business equivalent of tough losses.

So just like a quality start is no guarantee of a win in baseball, good data quality is no guarantee of a success in business.

But perhaps the relative reliability of data quality to predict business results should influence us to at least strive for a quality start to our business activities by starting them off with good data quality, thereby giving our organization a better chance to succeed.

 

Related Posts

DQ-View: Baseball and Data Quality

Poor Quality Data Sucks

Fantasy League Data Quality

There is No Such Thing as a Root Cause

Data Quality: Quo Vadimus?

OCDQ Radio - The Johari Window of Data Quality

OCDQ Radio - Redefining Data Quality

OCDQ Radio - The Blue Box of Information Quality

OCDQ Radio - Studying Data Quality

OCDQ Radio - Organizing for Data Quality

The Data Governance Imperative

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, Steve Sarsfield and I discuss how data governance is about changing the hearts and minds of your company to see the value of data quality, the characteristics of a data champion, and creating effective data quality scorecards.

Steve Sarsfield is a leading author and expert in data quality and data governance.  His book The Data Governance Imperative is a comprehensive exploration of data governance focusing on the business perspectives that are important to data champions, front-office employees, and executives.  He runs the Data Governance and Data Quality Insider, which is an award-winning and world-recognized blog.  Steve Sarsfield is the Product Marketing Manager for Data Governance and Data Quality at Talend.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

What is Weighing Down your Data?

On July 21, 1969, Neil Armstrong spoke the instantly famous words “that’s one small step for man, one giant leap for mankind” as he stepped off the ladder of the Apollo Lunar Module and became the first human being to walk on the surface of the Moon.

In addition to its many other, and more significant, scientific milestones, the Moon landing provided an excellent demonstration of three related, and often misunderstood, scientific concepts: mass, weight, and gravity.

Mass is an intrinsic property of matter, based on the atomic composition of a given object, such as your body for example, which means your mass would therefore remain the same regardless of whether you were walking on the surface of the Moon or Earth.

Weight is not an intrinsic property of matter, but is instead a gravitational force acting on matter.  Because the gravitational force of the Moon is less than the gravitational force of the Earth, you would weigh less on the Moon than you weigh on the Earth.  So, just like Neil Armstrong, your one small step on the surface of the Moon could quite literally become a giant leap.

Using these concepts metaphorically, mass is an intrinsic property of data, and perhaps a way to represent objective data quality, whereas weight is a gravitational force acting on data, and perhaps a way to represent subjective data quality.

Since most data can not escape the gravity of its application, most of what we refer to as data silos are actually application silos because data and applications become tightly coupled due to the strong gravitational force that an application exerts on its data.

Now, of course, an application can exert a strong gravitational force for a strong business reason (e.g., protecting sensitive data), and not, as we often assume by default, for a weak business reason (e.g., protecting corporate political power).

Although you probably don’t view your applications as something that is weighing down your data, and you probably also resist the feeling of weightlessness that can be caused by openly sharing your data, it’s worth considering that whether or not your data truly enables your organization to take giant leaps, not just small steps, depends on the gravitational forces acting on your data.

What is weighing down your data could also be weighing down your organization.

 

Related Posts

Data Myopia and Business Relativity

Are Applications the La Brea Tar Pits for Data?

Hell is other people’s data

My Own Private Data

No Datum is an Island of Serendip

Turning Data Silos into Glass Houses

Sharing Data

The Data Outhouse

The Good Data

Beyond a “Single Version of the Truth”

Our Increasingly Data-Constructed World

Last week, I joined fellow Information Management bloggers Art Petty, Mark Smith, Bruce Guptill, and co-hosts Eric Kavanagh and Jim Ericson for a DM Radio discussion about the latest trends and innovations in the information management industry.

For my contribution to the discussion, I talked about the long-running macro trend underlying many trends and innovations, namely that our world is becoming, not just more data-driven, but increasingly data-constructed.

Physicist John Archibald Wheeler contemplated how the bit is a fundamental particle, which, although insubstantial, could be considered more fundamental than matter itself.  He summarized this viewpoint in his pithy phrase “It from Bit” explaining how: “every it — every particle, every field of force, even the space-time continuum itself — derives its function, its meaning, its very existence entirely — even if in some contexts indirectly — from the answers to yes-or-no questions, binary choices, bits.”

In other words, we could say that the physical world is conceived of in, and derived from, the non-physical world of data.

Although bringing data into the real world has historically also required constructing other physical things to deliver data to us, more of the things in the physical world are becoming directly digitized.  As just a few examples, consider how we’re progressing:

  • From audio delivered via vinyl records, audio tapes, CDs, and MP3 files (and other file formats) to Web-streaming audio
  • From video delivered via movie reels, video tapes, DVDs, and MP4 files (and other file formats) to Web-streaming video
  • From text delivered via printed newspapers, magazines, and books to websites, blogs, e-books, and other electronic texts

Furthermore, we continue to see more physical tools (e.g., calculators, alarm clocks, calendars, dictionaries) transforming into apps and data on our smart phones, tablets, and other mobile devices.  Essentially, in a world increasingly constructed of an invisible and intangible substance called data (perhaps the datum should be added to the periodic table of elements?), one of the few things that we see and touch are the screens of our mobile devices that make the invisible visible and the intangible tangible.

 

Bitrate, Lossy Audio, and Quantity over Quality

If our world is becoming increasingly data-constructed, does that mean people are becoming more concerned about data quality?

In a bit, 0.  In a word, no.  And that’s because, much to the dismay of those working in the data quality profession, most people do not care about the quality of their data unless it becomes bad enough for them to pay attention to — and complain about.

An excellent example is bitrate, which refers to the number of bits — or the amount of data — that are processed over a certain amount of time.  In his article Does Bitrate Really Make a Difference In My Music?, Whitson Gordon examined the common debate about lossless and lossy audio formats.

Using the example of ripping a track from a CD to a hard drive, a lossless format means that the track is not compressed to the point where any of its data is lost, retaining, for all intents and purposes, the same audio data quality as the original CD track.

By contrast, a lossy format compresses the track so that it takes up less space by intentionally removing some of its data, thereby reducing audio data quality.  Audiophiles often claim anything other than vinyl records sound lousy because they are so lossy.

However, like truth, beauty, and art, data quality can be said to be in the eyes — or the ears — of the beholder.  So, if your favorite music sounds good enough to you in MP3 file format, then not only do you not need those physical vinyl records, audio tapes, and CDs anymore, but since you consider MP3 files good enough, you will not pay any further attention to audio data quality.

Another, and less recent, example is the videotape format war waged during the 1970s and 1980s between Betamax and VHS, when Betamax was widely believed to provide superior video data quality.

But a blank Betamax tape allowed users to record up to two hours of high-quality video, whereas a VHS tape allowed users to record up to four hours of slightly lower quality video.  Consumers consistently chose quantity over quality — and especially since lower quality also meant a lower price.  Betamax tapes and machines remained more expensive based on the assumption that consumers would pay a premium for higher quality video.

The VHS victory demonstrated how people often choose quantity over quality, so it doesn’t always pay to have better data quality.

 

Redefining Structure in a Data-Constructed World

Another side effect of our increasingly data-constructed world is that it is challenging the traditional data management notion that data has to be structured before it can be used — especially within many traditional notions of business intelligence.

Physicist Niels Bohr suggested that understanding the structure of the atom requires changing our definition of understanding.

Since a lot of the recent Big Data craze consists of unstructured or semi-structured data, perhaps understanding how much structure data truly requires for business applications (e.g., sentiment analysis of social networking data) requires changing our definition of structuring.  At the very least, we have to accept the fact that the relational data model is no longer our only option.

Although I often blog about how data and the real world are not the same thing, as more physical things, as well as more aspects of our everyday lives, become directly digitized, it is becoming more difficult to differentiate physical reality from digital reality.

 

Related Posts

HoardaBytes and the Big Data Lebowski

Magic Elephants, Data Psychics, and Invisible Gorillas

Big Data el Memorioso

The Big Data Collider

Information Overload Revisited

Dot Collectors and Dot Connectors

WYSIWYG and WYSIATI

Plato’s Data

The Data Cold War

A Farscape Analogy for Data Quality

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • A Brave New Data World — A discussion about how data, data quality, data-driven decision making, and metadata quality no longer reside exclusively within the esoteric realm of data management — basically, everyone is a data geek now.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Data Myopia and Business Relativity

Since how data quality is defined has a significant impact on how data quality is perceived, measured, and managed, in this post I examine the two most prevalent perspectives on defining data quality, real-world alignment and fitness for the purpose of use, which respectively represent what I refer to as the danger of data myopia and the challenge of business relativity.

Real-World Alignment: The Danger of Data Myopia

Whether it’s an abstract description of real-world entities (i.e., master data) or an abstract description of real-world interactions (i.e., transaction data) among entities, data is an abstract description of reality.  The creation and maintenance of these abstract descriptions shapes the organization’s perception of the real world, which I philosophically pondered in my post Plato’s Data.

The inconvenient truth is that the real world is not the same thing as the digital worlds captured within our databases.

And, of course, creating and maintaining these digital worlds is no easy task, which is exactly the danger inherent with the real-world alignment definition of data quality — when the organization’s data quality efforts are focused on minimizing the digital distance between data and the constantly changing real world that data attempts to describe, it can lead to a hyper-focus on the data in isolation, otherwise known as data myopia.

Even if we create and maintain perfect real-world alignment, what value does high-quality data possess independent of its use?

Real-world alignment reflects the perspective of the data provider, and its advocates argue that providing a trusted source of data to the organization will be able to satisfy any and all business requirements, i.e., high-quality data should be fit to serve as the basis for every possible use.  Therefore, in theory, real-world alignment provides an objective data foundation independent of the subjective uses defined by the organization’s many data consumers.

However, providing the organization with a single system of record, a single version of the truth, a single view, a golden copy, or a consolidated repository of trusted data has long been the rallying cry and siren song of enterprise data warehousing (EDW), and more recently, of master data management (MDM).  Although these initiatives can provide significant business value, it is usually poor data quality that undermines the long-term success and sustainability of EDW and MDM implementations.

Perhaps the enterprise needs a Ulysses pact to protect it from believing in EDW or MDM as a miracle exception for data quality?

A significant challenge for the data provider perspective on data quality is that it is difficult to make a compelling business case on the basis of trusted data without direct connections to the specific business needs of data consumers, whose business, data, and technical requirements are often in conflict with one another.

In other words, real-world alignment does not necessarily guarantee business-world alignment.

So, if using real-world alignment as the definition of data quality has inherent dangers, we might be tempted to conclude that the fitness for the purpose of use definition of data quality is the better choice.  Unfortunately, that is not necessarily the case.

Fitness for the Purpose of Use: The Challenge of Business Relativity

Relativity.jpg

In M. C. Escher’s famous 1953 lithograph Relativity, although we observe multiple, and conflicting, perspectives of reality, from the individual perspective of each person, everything must appear normal, since they are all casually going about their daily activities.

I have always thought this is an apt analogy for the multiple business perspectives on data quality that exists within every organization.

Like truth, beauty, and art, data quality can be said to be in the eyes of the beholder, or when data quality is defined as fitness for the purpose of use — the eyes of the user.

Most data has both multiple uses and users.  Data of sufficient quality for one use or user may not be of sufficient quality for other uses and users.  These multiple, and often conflicting, perspectives are considered irrelevant from the perspective of an individual user, who just needs quality data to support their own business activities.

Therefore, the user (i.e., data consumer) perspective establishes a relative business context for data quality.

Whereas the real-world alignment definition of data quality can cause a data-myopic focus, the business-world alignment goal of the fitness for the purpose of use definition must contend with the daunting challenge of business relativity.  Most data has multiple data consumers, each with their own relative business context for data quality, making it difficult to balance the diverse data needs and divergent data quality perspectives within the conflicting, and rather Escher-like, reality of the organization.

The data consumer perspective on data quality is often the root cause of the data silo problem, the bane of successful enterprise data management prevalent in most organizations, where each data consumer maintains their own data silo, customized to be fit for the purpose of their own use.  Organizational culture and politics also play significant roles since data consumers legitimately fear that losing their data silos would revert the organization to a one-size-fits-all data provider perspective on data quality.

So, clearly the fitness for the purpose of use definition of data quality is not without its own considerable challenges to overcome.

How does your organization define data quality?

As I stated at the beginning of this post, how data quality is defined has a significant impact on how data quality is perceived, measured, and managed.  I have witnessed the data quality efforts of an organization struggle with, and at times fail because of, either the danger of data myopia or the challenge of business relativity — or, more often than not, some combination of both.

Although some would define real-world alignment as data quality and fitness for the purpose of use as information quality, I have found adding the nuance of data versus information only further complicates an organization’s data quality discussions.

But for now, I will just conclude a rather long (sorry about that) post by asking for reader feedback on this perennial debate.

How does your organization define data quality?  Please share your thoughts and experiences by posting a comment below.

Commendable Comments (Part 12)

Since I officially launched this blog on March 13, 2009, that makes today the Third Blogiversary of OCDQ Blog!

So, absolutely without question, there is no better way to commemorate this milestone other than to also make this the 12th entry in my ongoing series for expressing my gratitude to my readers for their truly commendable comments on my blog posts.

 

Commendable Comments

On Big Data el Memorioso, Mark Troester commented:

“I think this helps illustrate that one size does not fit all.

You can’t take a singular approach to how you design for big data.  It’s all about identifying relevance and understanding that relevance can change over time.

There are certain situations where it makes sense to leverage all of the data, and now with high performance computing capabilities that include in-memory, in-DB and grid, it's possible to build and deploy rich models using all data in a short amount of time. Not only can you leverage rich models, but you can deploy a large number of models that leverage many variables so that you get optimal results.

On the other hand, there are situations where you need to filter out the extraneous information and the more intelligent you can be about identifying the relevant information the better.

The traditional approach is to grab the data, cleanse it, and land it somewhere before processing or analyzing the data.  We suggest that you leverage analytics up front to determine what data is relevant as it streams in, with relevance based on your organizational knowledge or context.  That helps you determine what data should be acted upon immediately, where it should be stored, etc.

And, of course, there are considerations about using visual analytic techniques to help you determine relevance and guide your analysis, but that’s an entire subject just on its own!”

On Data Governance Frameworks are like Jigsaw Puzzles, Gabriel Marcan commented:

“I agree (and like) the jigsaw puzzles metaphor.  I would like to make an observation though:

Can you really construct Data Governance one piece at a time?

I would argue you need to put together sets of pieces simultaneously, and to ensure early value, you might want to piece together the interesting / easy pieces first.

Hold on, that sounds like the typical jigsaw strategy anyway . . . :-)”

On Data Governance Frameworks are like Jigsaw Puzzles, Doug Newdick commented:

“I think that there are a number of more general lessons here.

In particular, the description of the issues with data governance sounds very like the issues with enterprise architecture.  In general, there are very few eureka moments in solving the business and IT issues plaguing enterprises.  These solutions are usually 10% inspiration, 90% perspiration in my experience.  What looks like genius or a sudden breakthrough is usually the result of a lot of hard work.

I also think that there is a wider Myth of the Framework at play too.

The myth is that if we just select the right framework then everything else will fall into place.  In reality, the selection of the framework is just the start of the real work that produces the results.  Frameworks don’t solve your problems, people solve your problems by the application of brain-power and sweat.

All frameworks do is take care of some of the heavy-lifting, i.e., the mundane foundational research and thinking activity that is not specific to your situation.

Unfortunately the myth of the framework is why many organizations think that choosing TOGAF will immediately solve their IT issues and are then disappointed when this doesn’t happen, when a more sensible approach might have garnered better long-term success.”

On Data Quality: Quo Vadimus?, Richard Jarvis commented:

“I agree with everything you’ve said, but there’s a much uglier truth about data quality that should also be discussed — the business benefit of NOT having a data quality program.

The unfortunate reality is that in a tight market, the last thing many decision makers want to be made public (internally or externally) is the truth.

In a company with data quality principles ingrained in day-to-day processes, and reporting handled independently, it becomes much harder to hide or reinterpret your falling market share.  Without these principles though, you’ll probably be able to pick your version of the truth from a stack of half a dozen, then spend your strategy meeting discussing which one is right instead of what you’re going to do about it.

What we’re talking about here is the difference between a Politician — who will smile at the camera and proudly announce 0.1% growth was a fantastic result given X, Y, and Z factors — and a Statistician who will endeavor to describe reality with minimal personal bias.

And the larger the organization, the more internal politics plays a part.  I believe a lot of the reluctance in investing in data quality initiatives could be traced back to this fear of being held truly accountable, regardless of it being in the best interests of the organization.  To build a data quality-centric culture, the change must be driven from the CEO down if it’s to succeed.”

On Data Quality: Quo Vadimus?, Peter Perera commented:

“The question: ‘Is Data Quality a Journey or a Destination?’ suggests that it is one or the other.

I agree with another comment that data quality is neither . . . or, I suppose, it could be both (the journey is the destination and the destination is the journey. They are one and the same.)

The quality of data (or anything for that matter) is something we experience.

Quality only radiates when someone is in the act of experiencing the data, and usually only when it is someone that matters.  This radiation decays over time, ranging from seconds or less to years or more.

The only problem with viewing data quality as radiation is that radiation can be measured by an instrument, but there is no such instrument to measure data quality.

We tend to confuse data qualities (which can be measured) and data quality (which cannot).

In the words of someone whose name I cannot recall: Quality is not job one. Being totally %@^#&$*% amazing is job one.The only thing I disagree with here is that being amazing is characterized as a job.

Data quality is not something we do to data.  It’s not a business initiative or project or job.  It’s not a discipline.  We need to distinguish between the pursuit (journey) of being amazing and actually being amazing (destination — but certainly not a final one).  To be amazing requires someone to be amazed.  We want data to be continuously amazing . . . to someone that matters, i.e., someone who uses and values the data a whole lot for an end that makes a material difference.

Come to think of it, the only prerequisite for data quality is being alive because that is the only way to experience it.  If you come across some data and have an amazed reaction to it and can make a difference using it, you cannot help but experience great data quality.  So if you are amazing people all the time with your data, then you are doing your data quality job very well.”

On Data Quality and Miracle Exceptions, Gordon Hamilton commented:

“Nicely delineated argument, Jim.  Successfully starting a data quality program seems to be a balance between getting started somewhere and determining where best to start.  The data quality problem is like a two-edged sword without a handle that is inflicting the death of a thousand cuts.

Data quality is indeed difficult to get a handle on.”

And since they generated so much great banter, please check out all of the commendable comments received by the blog posts There is No Such Thing as a Root Cause and You only get a Return from something you actually Invest in.

 

Thank You for Three Awesome Years

You are Awesome — which is why receiving your comments has been the most rewarding aspect of my blogging experience over the last three years.  Even if you have never posted a comment, you are still awesome — feel free to tell everyone I said so.

This entry in the series highlighted commendable comments on blog posts published between December 2011 and March 2012.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please continue commenting and stay tuned for future entries in the series.

Thank you for reading the Obsessive-Compulsive Data Quality blog for the last three years. Your readership is deeply appreciated.

 

Related Posts

Commendable Comments (Part 11)

Commendable Comments (Part 10) – The 300th OCDQ Blog Post

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

OCDQ Blog Bicentennial – The 200th OCDQ Blog Post

Commendable Comments (Part 9)

Commendable Comments (Part 8)

Commendable Comments (Part 7)

Commendable Comments (Part 6)

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

Commendable Comments (Part 4)

Commendable Comments (Part 3)

Commendable Comments (Part 2)

Commendable Comments (Part 1)

Data Quality and Big Data

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

This is Part 2 of 2 from my recent discussion with Tom Redman.  In this episode, Tom and I discuss data quality and big data, including if data quality matters less in larger data sets, if statistical outliers represent business insights or data quality issues, statistical sampling errors versus measurement calibration errors, mistaking signal for noise (i.e., good data for bad data), and whether or not the principles and practices of true “data scientists” will truly be embraced by an organization’s business leaders.

Dr. Thomas C. Redman (the “Data Doc”) is an innovator, advisor, and teacher.  He was first to extend quality principles to data and information in the late 80s.  Since then he has crystallized a body of tools, techniques, roadmaps and organizational insights that help organizations make order-of-magnitude improvements.

More recently Tom has developed keen insights into the nature of data and formulated the first comprehensive approach to “putting data to work.”  Taken together, these enable organizations to treat data as assets of virtually unlimited potential.

Tom has personally helped dozens of leaders and organizations better understand data and data quality and start their data programs.  He is a sought-after lecturer and the author of dozens of papers and four books.  The most recent, Data Driven: Profiting from Your Most Important Business Asset (Harvard Business Press, 2008) was a Library Journal best buy of 2008.

Prior to forming Navesink Consulting Group in 1996, Tom conceived the Data Quality Lab at AT&T Bell Laboratories in 1987 and led it until 1995.  Tom holds a Ph.D. in statistics from Florida State University. He holds two patents.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Data Driven

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

This is Part 1 of 2 from my recent discussion with Tom Redman.  In this episode, Tom and I discuss concepts from one of my favorite data quality books, which is his most recent book: Data Driven: Profiting from Your Most Important Business Asset.

Our discussion includes viewing data as an asset, an organization’s hierarchy of data needs, a simple model for culture change, and attempting to achieve the “single version of the truth” being marketed as a goal of master data management (MDM).

Dr. Thomas C. Redman (the “Data Doc”) is an innovator, advisor, and teacher.  He was first to extend quality principles to data and information in the late 80s.  Since then he has crystallized a body of tools, techniques, roadmaps and organizational insights that help organizations make order-of-magnitude improvements.

More recently Tom has developed keen insights into the nature of data and formulated the first comprehensive approach to “putting data to work.”  Taken together, these enable organizations to treat data as assets of virtually unlimited potential.

Tom has personally helped dozens of leaders and organizations better understand data and data quality and start their data programs.  He is a sought-after lecturer and the author of dozens of papers and four books.

Prior to forming Navesink Consulting Group in 1996, Tom conceived the Data Quality Lab at AT&T Bell Laboratories in 1987 and led it until 1995. Tom holds a Ph.D. in statistics from Florida State University.  He holds two patents.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Data Quality and Miracle Exceptions

“Reading superhero comic books with the benefit of a Ph.D. in physics,” James Kakalios explained in The Physics of Superheroes, “I have found many examples of the correct description and application of physics concepts.  Of course, the use of superpowers themselves involves direct violations of the known laws of physics, requiring a deliberate and willful suspension of disbelief.”

“However, many comics need only a single miracle exception — one extraordinary thing you have to buy into — and the rest that follows as the hero and the villain square off would be consistent with the principles of science.”

“Data Quality is all about . . .”

It is essential to foster a marketplace of ideas about data quality in which a diversity of viewpoints is freely shared without bias, where everyone is invited to get involved in discussions and debates and have an opportunity to hear what others have to offer.

However, one of my biggest pet peeves about the data quality industry is when I listen to analysts, vendors, consultants, and other practitioners discuss data quality challenges, I am often required to make a miracle exception for data quality.  In other words, I am given one extraordinary thing I have to buy into in order to be willing to buy their solution to all of my data quality problems.

These superhero comic book style stories usually open with a miracle exception telling me that “data quality is all about . . .”

Sometimes, the miracle exception is purchasing technology from the right magic quadrant.  Other times, the miracle exception is either following a comprehensive framework, or following the right methodology from the right expert within the right discipline (e.g., data modeling, business process management, information quality management, agile development, data governance, etc.).

But I am especially irritated by individuals who bash vendors for selling allegedly only reactive data cleansing tools, while selling their allegedly only proactive defect prevention methodology, as if we could avoid cleaning up the existing data quality issues, or we could shut down and restart our organizations, so that before another single datum is created or business activity is executed, everyone could learn how to “do things the right way” so that “the data will always be entered right, the first time, every time.”

Although these and other miracle exceptions do correctly describe the application of data quality concepts in isolation, by doing so, they also oversimplify the multifaceted complexity of data quality, requiring a deliberate and willful suspension of disbelief.

Miracle exceptions certainly make for more entertaining stories and more effective sales pitches, but oversimplifying complexity for the purposes of explaining your approach, or, even worse and sadly more common, preaching at people that your approach definitively solves their data quality problems, is nothing less than applying the principle of deus ex machina to data quality.

Data Quality and deus ex machina

Deus ex machina is a plot device whereby a seemingly unsolvable problem is suddenly and abruptly solved with the contrived and unexpected intervention of some new event, character, ability, or object.

This technique is often used in the marketing of data quality software and services, where the problem of poor data quality can seemingly be solved by a new event (e.g., creating a data governance council), a new character (e.g., hiring an expert consultant), a new ability (e.g., aligning data quality metrics with business insight), or a new object (e.g., purchasing a new data quality tool).

Now, don’t get me wrong.  I do believe various technologies and methodologies from numerous disciplines, as well as several core principles (e.g., communication, collaboration, and change management) are all important variables in the data quality equation, but I don’t believe that any particular variable can be taken in isolation and deified as the God Particle of data quality physics.

Data Quality is Not about One Extraordinary Thing

Data quality isn’t all about technology, nor is it all about methodology.  And data quality isn’t all about data cleansing, nor is it all about defect prevention.  Data quality is not about only one thing — no matter how extraordinary any one of its things may seem.

Battling the dark forces of poor data quality doesn’t require any superpowers, but it does require doing the hard daily work of continuously improving your data quality.  Data quality does not have a miracle exception, so please stop believing in one.

And for the love of high-quality data everywhere, please stop trying to sell us one.

Data Quality: Quo Vadimus?

Over the past week, an excellent meme has been making its way around the data quality blogosphere.  It all started, as many of the best data quality blogging memes do, with a post written by Henrik Liliendahl Sørensen.

In Turning a Blind Eye to Data Quality, Henrik blogged about how, as data quality practitioners, we are often amazed by the inconvenient truth that our organizations are capable of growing as a successful business even despite the fact that they often turn a blind eye to data quality by ignoring data quality issues and not following the data quality best practices that we advocate.

“The evidence about how poor data quality is costing enterprises huge sums of money has been out there for a long time,” Henrik explained.  “But business successes are made over and over again despite bad data.  There may be casualties, but the business goals are met anyway.  So, poor data quality is just something that makes the fight harder, not impossible.”

As data quality practitioners, we often don’t effectively sell the business benefits of data quality, but instead we often only talk about the negative aspects of not investing in data quality, which, as Henrik explained, is usually why business leaders turn a blind eye to data quality challenges.  Henrik concluded with the recommendation that when we are talking with business leaders, we need to focus on “smaller, but tangible, wins where data quality improvement and business efficiency goes hand in hand.”

 

Is Data Quality a Journey or a Destination?

Henrik’s blog post received excellent comments, which included a debate about whether data quality is a journey or a destination.

Garry Ure responded with his blog post Destination Unknown, in which he explained how “historically the quest for data quality was likened to a journey to convey the concept that you need to continue to work in order to maintain quality.”  But Garry also noted that sometimes when an organization does successfully ingrain data quality practices into day-to-day business operations, it can make it seem like data quality is a destination that the organization has finally reached.

Garry concluded data quality is “just one destination of many on a long and somewhat recursive journey.  I think the point is that there is no final destination, instead the journey becomes smoother, quicker, and more pleasant for those traveling.”

Bryan Larkin responded to Garry with the blog post Data Quality: Destinations Known, in which Bryan explained, “data quality should be a series of destinations where short journeys occur on the way to those destinations.  The reason is simple.  If we make it about one big destination or one big journey, we are not aligning our efforts with business goals.”

In order to do this, Bryan recommends that “we must identify specific projects that have tangible business benefits (directly to the bottom line — at least to begin with) that are quickly realized.  This means we are looking at less of a smooth journey and more of a sprint to a destination — to tackle a specific problem and show results in a short amount of time.  Most likely we’ll have a series of these sprints to destinations with little time to enjoy the journey.”

“While comprehensive data quality initiatives,” Bryan concluded, “are things we as practitioners want to see — in fact we build our world view around such — most enterprises (not all, mind you) are less interested in big initiatives and more interested in finite, specific, short projects that show results.  If we can get a series of these lined up, we can think of them more in terms of an overall comprehensive plan if we like — even a journey.  But most functional business staff will think of them in terms of the specific projects that affect them.”

The Latin phrase Quo Vadimus? translates into English as “Where are we going?”  When I ponder where data quality is going, and whether data quality is a journey or a destination, I am reminded of the words of T.S. Eliot:

“We must not cease from exploration and the end of all our exploring will be to arrive where we began and to know the place for the first time.”

We must not cease from exploring new ways to continuously improve our data quality and continuously put into practice our data governance principles, policies, and procedures, and the end of all our exploring will be to arrive where we began and to know, perhaps for the first time, the value of high-quality data to our enterprise’s continuing journey toward business success.

The Algebra of Collaboration

Most organizations have a vertical orientation, which creates a division of labor between functional areas where daily operations are carried out by people who have been trained in a specific type of business activity (e.g., Product Manufacturing, Marketing, Sales, Finance, Customer Service).  However, according to the most basic enterprise arithmetic, the sum of all vertical functions is one horizontal organization.  For example, in an organization with five vertical functions, 1 + 1 + 1 + 1 + 1 = 1 (and not 5).

Other times, it seems like division is the only mathematics the enterprise understands, creating perceived organizational divides based on geography (e.g., the Boston office versus the London office), or hierarchy (e.g., management versus front-line workers), or the Great Rift known as the Business versus IT.

However, enterprise-wide initiatives, such as data quality and data governance, require a cross-functional alignment reaching horizontally across the organization’s vertical functions, fostering a culture of collaboration combining a collective ownership with a shared responsibility and an individual accountability, requiring a branch of mathematics I call the Algebra of Collaboration.

For starters, as James Kakalios explained in his super book The Physics of Superheroes, “there is a trick to algebra: If one has an equation describing a true statement, such as 1 = 1, then one can add, subtract, multiply, or divide (excepting division by zero) the equation by any number we wish, and as long as we do it to both the left and right sides of the equation, the correctness of the equation is unchanged.  So if we add 2 to both sides of 1 = 1, we obtain 1 + 2 = 1 + 2 or 3 = 3, which is still a true statement.”

So, in the Algebra of Collaboration, we first establish one of the organization’s base equations, its true statements, for example, using the higher order collaborative equation that attempts to close the Great Rift otherwise known as the IT-Business Chasm:

Business = IT

Then we keep this base equation balanced by performing the same operation on both the left and right sides, for example:

Business + Data Quality + Data Governance = IT + Data Quality + Data Governance

The point is that everyone, regardless of their primary role or vertical function, must accept a shared responsibility for preventing data quality lapses and for responding appropriately to mitigate the associated business risks when issues occur.

Now, of course, as I blogged about in The Stakeholder’s Dilemma, this equation does not always remain perfectly balanced at all times.  The realities of the fiscal calendar effect, conflicting interests, and changing business priorities, will mean that the amount of resources (money, time, people) added to the equation by a particular stakeholder, vertical function, or group will vary.

But it’s important to remember the true statement that the base equation represents.  The trick of algebra is just one of the tricks of the collaboration trade.  Organizations that are successful with data quality and data governance view collaboration not just as a guiding principle, but also as a call to action in their daily practices.

Is your organization practicing the Algebra of Collaboration?

 

Related Posts

The Business versus IT—Tear down this wall!

The Road of Collaboration

The Collaborative Culture of Data Governance

Collaboration isn’t Brain Surgery

Finding Data Quality

Being Horizontally Vertical

The Year of the Datechnibus

Dot Collectors and Dot Connectors

No Datum is an Island of Serendip

The Three Most Important Letters in Data Governance

The Stakeholder’s Dilemma

Are you Building Bridges or Digging Moats?

Has Data Become a Four-Letter Word?

The Data Governance Oratorio

Video: Declaration of Data Governance

Data Love Song Mashup

Today is February 14 — Valentine’s Day — the annual celebration of enduring romance, where true love is publicly judged according to your willingness to purchase chocolate, roses, and extremely expensive jewelry, and privately judged in ways that nobody (and please, trust me when I say nobody) wants to see you post on Twitter, Facebook, YouTube, or your blog.

Valentine’s Day is for people in love to celebrate their love privately in whatever way works best for them.

But since your data needs love too, this blog post provides a mashup of love songs for your data.

Data Love Song Mashup

I’ve got sunshine on a cloud computing day
When it’s cold outside, I’ve got backups from the month of May
I guess you’d say, what can make me feel this way?
My data, my data, my data
Singing about my data
My data

My data’s so beautiful 
And I tell it every day
When I see your user interface
There’s not a thing that I would change
Because my data, you’re amazing
Just the way you are
You’re amazing data
Just the way you are

They say we’re young and we don’t know
We won’t find data quality issues until we grow
Well I don’t know if that is true
Because you got me, data
And data, I got you
I got you, data

Look into my eyes, and you will see
What my data means to me
Don’t tell me data quality is not worth trying for
Don’t tell me it’s not worth fighting for
You know it’s true
Everything I do, I do data quality for you

I can’t make you love data if you don’t
I can’t make your heart feel something it won’t

But there’s nothing you can do that can’t be done
Nothing you can sing that can’t be sung
Nothing you can make that can’t be made
All you need is love . . . for data
Love for data is all you need

Business people working hard all day and through the night
Their database queries searching for business insight
Some will win, some will lose
Some were born to sing the data quality blues
Oh, the need for business insight never ends
It goes on and on and on and on
Don’t stop believing
Hold on to that data loving feeling

Look at your data, I know its poor quality is showing
Look at your organization, you don’t know where it’s going
I don’t know much, but I know your data needs love too
And that may be all I need to know

Nothing compares to data quality, no worries or cares
Business regrets and decision mistakes, they’re memories made
But if you don’t continuously improve, how bittersweet that will taste
I wish nothing but the best for you
I wish nothing but the best for your data too
Don’t forget data quality, I beg, please remember I said
Sometimes quality lasts in data, but sometimes it hurts instead

 

Happy Valentine’s Day to you and yours

Happy Data Quality to you and your data

Decision Management Systems

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, I discuss decision management with James Taylor, author of the new book Decision Management Systems: A Practical Guide to Using Business Rules and Predictive Analytics.

James Taylor is the CEO of Decision Management Solutions, and the leading expert in Decision Management Systems, which are active participants in improving business results by applying business rules, predictive analytics, and optimization technologies to address the toughest issues facing businesses today, and changing the way organizations are doing business.

James Taylor has led Decision Management efforts for leading companies in insurance, banking, health management, and telecommunications.  Decision Management Solutions works with clients to improve their business by applying analytics and business rules technology to automate and improve decisions.  Clients range from start-ups and software companies to major North American insurers, a travel company, the health management division of a major healthcare company, one of Europe’s largest banks, and several major decision management technology vendors.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.