Open MIKE Podcast — Episode 01

Method for an Integrated Knowledge Environment (MIKE2.0) is an open source delivery framework for Enterprise Information Management, which provides a comprehensive methodology that can be applied across a number of different projects within the Information Management space.  For more information, click on this link: openmethodology.org/wiki/What_is_MIKE2.0

The Open MIKE Podcast is a video podcast show, hosted by Jim Harris, which discusses aspects of the MIKE2.0 framework, and features content contributed to MIKE 2.0 Wiki Articles, Blog Posts, and Discussion Forums.

 

Episode 01: Information Management Principles

If you’re having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Open MIKE Podcast on Vimeo

 

MIKE2.0 Content Featured in or Related to this Podcast

Information Management Principles: openmethodology.org/wiki/Economic_Value_of_Information

Information Economics: openmethodology.org/wiki/Information_Economics

You can also find the videos and blog post summaries for every episode of the Open MIKE Podcast at: ocdqblog.com/MIKE

Exercise Better Data Management

Recently on Twitter, Daragh O Brien and I discussed his proposed concept.  “After Big Data,” Daragh tweeted, “we will inevitably begin to see the rise of MOData as organizations seek to grab larger chunks of data and digest it.  What is MOData?  It’s MO’Data, as in MOre Data. Or Morbidly Obese Data.  Only good data quality and data governance will determine which.”

Daragh asked if MO’Data will be the Big Data Killer.  I said only if MO’Data doesn’t include MO’BusinessInsight, MO’DataQuality, and MO’DataPrivacy (i.e., more business insight, more data quality, and more data privacy).

“But MO’Data is about more than just More Data,” Daragh replied.  “It’s about avoiding Morbidly Obese Data that clogs data insight and data quality, etc.”

I responded that More Data becomes Morbidly Obese Data only if we don’t exercise better data management practices.

Agreeing with that point, Daragh replied, “Bring on MOData and the Pilates of Data Quality and Data Governance.”

To slightly paraphrase lines from one of my favorite movies — Airplane! — the Cloud is getting thicker and the Data is getting laaaaarrrrrger.  Surely I know well that growing data volumes is a serious issue — but don’t call me Shirley.

Whether you choose to measure it in terabytes, petabytes, exabytes, HoardaBytes, or how much reality bites, the truth is we were consuming way more than our recommended daily allowance of data long before the data management industry took a tip from McDonald’s and put the word “big” in front of its signature sandwich.  (Oh great . . . now I’m actually hungry for a Big Mac.)

But nowadays with silos replicating data, as well as new data, and new types of data, being created and stored on a daily basis, our data is resembling the size of Bob Parr in retirement, making it seem like not even Mr. Incredible in his prime possessed the super strength needed to manage all of our data.  Those were references to the movie The Incredibles, where Mr. Incredible was a superhero who, after retiring into civilian life under the alias of Bob Parr, elicits the observation from this superhero costume tailor: “My God, you’ve gotten fat.”  Yes, I admit not even Helen Parr (aka Elastigirl) could stretch that far for a big data joke.

A Healthier Approach to Big Data

Although Daragh’s concerns about morbidly obese data are valid, no superpowers (or other miracle exceptions) are needed to manage all of our data.  In fact, it’s precisely when we are so busy trying to manage all of our data that we hoard countless bytes of data without evaluating data usage, gathering data requirements, or planning for data archival.  It’s like we are trying to lose weight by eating more and exercising less, i.e., consuming more data and exercising less data quality and data governance.  As Daragh said, only good data quality and data governance will determine whether we get more data or morbidly obese data.

Losing weight requires a healthy approach to both diet and exercise.  A healthy approach to diet includes carefully choosing the food you consume and carefully controlling your portion size.  A healthy approach to exercise includes a commitment to exercise on a regular basis at a sufficient intensity level without going overboard by spending several hours a day, every day, at the gym.

Swimming is a great form of exercise, but swimming in big data without having a clear business objective before you jump into the pool is like telling your boss that you didn’t get any work done because you decided to spend all day working out at the gym.

Carefully choosing the data you consume and carefully controlling your data portion size is becoming increasingly important since big data is forcing us to revisit information overload.  However, the main reason that traditional data management practices often become overwhelmed by big data is because traditional data management practices are not always the right approach.

We need to acknowledge that some big data use cases differ considerably from traditional ones.  Data modeling is still important and data quality still matters, but how much data modeling and data quality is needed before big data can be effectively used for business purposes will vary.  In order to move the big data discussion forward, we have to stop fiercely defending our traditional perspectives about structure and quality.  We also have to stop fiercely defending our traditional perspectives about analytics, since there will be some big data use cases where depth and detailed analysis may not be necessary to provide business insight.

Better than Big or More

Jim Ericson explained that your data is big enough.  Rich Murnane explained that bigger isn’t better, better is better.  Although big data may indeed be followed by more data that doesn’t necessarily mean we require more data management in order to prevent more data from becoming morbidly obese data.  I think that we just need to exercise better data management.

 

Related Posts

Demystifying Master Data Management

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, special guest John Owens and I attempt to demystify master data management (MDM) by explaining the three types of data (Transaction, Domain, Master) and the four master data entities (Party, Product, Location, Asset), as well as, and perhaps the most important concept of all, the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

John Owens is a thought leader, consultant, mentor, and writer in the worlds of business and data modelling, data quality, and master data management (MDM).  He has built an international reputation as a highly innovative specialist in these areas and has worked in and led multi-million dollar projects in a wide range of industries around the world.

John Owens has a gift for identifying the underlying simplicity in any enterprise, even when shrouded in complexity, and bringing it to the surface.  He is the creator of the Integrated Modelling Method (IMM), which is used by business and data analysts around the world.  Later this year, John Owens will be formally launching the IMM Academy, which will provide high quality resources, training, and mentoring for business and data analysts at all levels.

You can also follow John Owens on Twitter and connect with John Owens on Linkedin.  And if you’re looking for a MDM course, consider the online course from John Owens, which you can find by clicking on this link: MDM Online Course (Affiliate Link)

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Commendable Comments (Part 13)

Welcome to the 400th Obsessive-Compulsive Data Quality (OCDQ) blog post!  I am commemorating this milestone with the 13th entry in my ongoing series for expressing gratitude to my readers for their truly commendable comments on my blog posts.

 

Commendable Comments

On Will Big Data be Blinded by Data Science?, Meta Brown commented:

“Your concern is well-founded. Knowing how few businesses make really good use of the small data they’ve had around all along, it’s easy to imagine that they won’t do any better with bigger data sets.

I wrote some hints for those wallowing into the big data mire in my post, Better than Brute Force: Big Data Analytics Tips. But the truth is that many organizations won’t take advantage of the ideas that you are presenting, or my tips, especially as the datasets grow larger. That’s partly because they have no history in scientific methods, and partly because the data science movement is driving employers to search for individuals with heroically large skill sets.

Since few, if any, people truly meet these expectations, those hired will have real human limitations, and most often they will be people who know much more about data storage and manipulation than data analysis and applications.”

On Will Big Data be Blinded by Data Science?, Mike Urbonas commented:

“The comparison between scientific inquiry and business decision making is a very interesting and important one. Successfully serving a customer and boosting competitiveness and revenue does require some (hopefully unique) insights into customer needs. Where do those insights come from?

Additionally, scientists also never stop questioning and improving upon fundamental truths, which I also interpret as not accepting conventional wisdom — obviously an important trait of business managers.

I recently read commentary that gave high praise to the manager utilizing the scientific method in his or her decision-making process. The author was not a technologist, but rather none other than Peter Drucker, in writings from decades ago.

I blogged about Drucker’s commentary, data science, the scientific method vs. business decision making, and I’d value your and others’ input: Business Managers Can Learn a Lot from Data Scientists.”

On Word of Mouth has become Word of Data, Vish Agashe commented:

“I would argue that listening to not only customers but also business partners is very important (and not only in retail but in any business). I always say that, even if as an organization you are not active in the social world, assume that your customers, suppliers, employees, competitors are active in the social world and they will talk about you (as a company), your people, products, etc.

So it is extremely important to tune in to those conversations and evaluate its impact on your business. A dear friend of mine ventured into the restaurant business a few years back. He experienced a little bit of a slowdown in his business after a great start. He started surveying his customers, brought in food critiques to evaluate if the food was a problem, but he could not figure out what was going on. I accidentally stumbled upon Yelp.com and noticed that his restaurant’s rating had dropped and there were some complaints recently about services and cleanliness (nothing major though).

This happened because he had turnover in his front desk staff. He was able to address those issues and was able to reach out to customers who had bad experience (some of them were frequent visitors). They were able to go back and comment and give newer ratings to his business. This helped him with turning the corner and helped with the situation.

This was a big learning moment for me about the power of social media and the need for monitoring it.”

On Data Quality and the Bystander Effect, Jill Wanless commented:

“Our organization is starting to develop data governance processes and one of the processes we have deliberately designed is to get to the root cause of data quality issues.

We’ve designed it so that the errors that are reported also include the userid and the system where the data was generated. Errors are then filtered by function and the business steward responsible for that function is the one who is responsible for determining and addressing the root cause (which of course may require escalation to solve).

The business steward for the functional area has the most at stake in the data and is typically the most knowledgeable as to the process or system that may be triggering the error. We have yet to test this as we are currently in the process of deploying a pilot stewardship program.

However, we are very confident that it will help us uncover many of the causes of the data quality problems and with lots of PLAN, DO, CHECK, and ACT, our goal is to continuously improve so that our need for stewardship eventually (many years away no doubt) is reduced.”

On The Return of the Dumb Terminal, Prashanta Chandramohan commented:

“I can’t even imagine what it’s like to use this iPad I own now if I am out of network for an hour. Supposedly the coolest thing to own and a breakthrough innovation of this decade as some put it, it’s nothing but a dumb terminal if I do not have 3G or Wi-Fi connectivity.

Putting most of my documents, notes, to-do’s, and bookmarked blogs for reading later (e.g., Instapaper) in the cloud, I am sure to avoid duplicating data and eliminate installing redundant applications.

(Oops! I mean the apps! :) )

With cloud-based MDM and Data Quality tools starting to linger, I can’t wait to explore and utilize the advantages these return of dumb terminals bring to our enterprise information management field.”

On Big Data Lessons from Orbitz, Dylan Jones commented:

“The fact is that companies have always done predictive marketing, they’re just getting smarter at it.

I remember living as a student in a fairly downtrodden area that because of post code analytics meant I was bombarded with letterbox mail advertising crisis loans to consolidate debts and so on. When I got my first job and moved to a new area all of a sudden I was getting loans to buy a bigger car. The companies were clearly analyzing my wealth based on post code lifestyle data.

Fast forward and companies can do way more as you say.

Teresa Cottam (Global Telecoms Analyst) has cited the big telcos as a major driver in all this, they now consider themselves data companies so will start to offer more services to vendors to track our engagement across the entire communications infrastructure (Read more here: http://bit.ly/xKkuX6).

I’ve just picked up a shiny new Mac this weekend after retiring my long suffering relationship with Windows so it will be interesting to see what ads I get served!”

And please check out all of the commendable comments received on the blog post: Data Quality and Chicken Little Syndrome.

 

Thank You for Your Comments and Your Readership

You are Awesome — which is why receiving your comments has been the most rewarding aspect of my blogging experience over the last 400 posts.  Even if you have never posted a comment, you are still awesome — feel free to tell everyone I said so.

This entry in the series highlighted commendable comments on blog posts published between April 2012 and June 2012.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please continue commenting and stay tuned for future entries in the series.

Thank you for reading the Obsessive-Compulsive Data Quality blog.  Your readership is deeply appreciated.

 

Related Posts

Commendable Comments (Part 12) – The Third Blogiversary of OCDQ Blog

Commendable Comments (Part 11)

Commendable Comments (Part 10) – The 300th OCDQ Blog Post

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

OCDQ Blog Bicentennial – The 200th OCDQ Blog Post

Commendable Comments (Part 9)

Commendable Comments (Part 8)

Commendable Comments (Part 7)

Commendable Comments (Part 6)

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

Commendable Comments (Part 4)

Commendable Comments (Part 3)

Commendable Comments (Part 2)

Commendable Comments (Part 1)

DQ-View: The Five Stages of Data Quality

Data Quality (DQ) View is an OCDQ regular segment. Each DQ-View is a brief video discussion of a data quality key concept.

In my experience, all organizations cycle through five stages while coming to terms with the daunting challenges of data quality, which are somewhat similar to The Five Stages of Grief.  So, in this short video, I explain The Five Stages of Data Quality:

  1. Denial — Our organization is well-managed and highly profitable.  We consistently meet, or exceed, our business goals.  We obviously understand the importance of high-quality data.  Data quality issues can’t possibly be happening to us.
  2. Anger — We’re now in the midst of a financial reporting scandal, and facing considerable fines in the wake of a regulatory compliance failure.  How can this be happening to us?  Why do we have data quality issues?  Who is to blame for this?
  3. Bargaining — Okay, we may have just overreacted a little bit.  We’ll purchase a data quality tool, approve a data cleansing project, implement defect prevention, and initiate data governance.  That will fix all of our data quality issues — right?
  4. Depression — Why, oh why, do we keep having data quality issues?  Why does this keep happening to us?  Maybe we should just give up, accept our doomed fate, and not bother doing anything at all about data quality and data governance.
  5. Acceptance — We can’t fight the truth anymore.  We accept that we have to do the hard daily work of continuously improving our data quality and continuously implementing our data governance principles, policies, and procedures.

Quality is the Higgs Field of Data

Recently on Twitter, Daragh O Brien replied to my David Weinberger quote “The atoms of data hook together only because they share metadata,” by asking “So, is Quality Data the Higgs Boson of Information Management?”

I responded that Quality is the Higgs Boson of Data and Information since Quality gives Data and Information their Mass (i.e., their Usefulness).

“Now that is profound,” Daragh replied.

“That’s cute and all,” Brian Panulla interjected, “but you can’t measure Quality.  Mass is objective.  It’s more like Weight — a mass in context.”

I agreed with Brian’s great point since in a previous post I explained the often misunderstood difference between mass, an intrinsic property of matter based on atomic composition, and weight, a gravitational force acting on matter.

Using these concepts metaphorically, mass is an intrinsic property of data, representing objective data quality, whereas weight is a gravitational force acting on data, thereby representing subjective data quality.

But my previous post didn’t explain where matter theoretically gets its mass, and since this scientific mystery was radiating in the cosmic background of my Twitter banter with Daragh and Brian, I decided to use this post to attempt a brief explanation along the way to yet another data quality analogy.

As you have probably heard by now, big scientific news was recently reported about the discovery of the Higgs Boson, which, since the 1960s, the Standard Model of particle physics has theorized to be the fundamental particle associated with a ubiquitous quantum field (referred to as the Higgs Field) that gives all matter its mass by interacting with the particles that make up atoms and weighing them down.  This is foundational to our understanding of the universe because without something to give mass to the basic building blocks of matter, everything would behave the same way as the intrinsically mass-less photons of light behave, floating freely and not combining with other particles.  Therefore, without mass, ordinary matter, as we know it, would not exist.

 

Ping-Pong Balls and Maple Syrup

I like the Higgs Field explanation provided by Brian Cox and Jeff Forshaw.  “Imagine you are blindfolded, holding a ping-pong ball by a thread.  Jerk the string and you will conclude that something with not much mass is on the end of it.  Now suppose that instead of bobbing freely, the ping-pong ball is immersed in thick maple syrup.  This time if you jerk the thread you will encounter more resistance, and you might reasonably presume that the thing on the end of the thread is much heavier than a ping-pong ball.  It is as if the ball is heavier because it gets dragged back by the syrup.”

“Now imagine a sort of cosmic maple syrup that pervades the whole of space.  Every nook and cranny is filled with it, and it is so pervasive that we do not even know it is there.  In a sense, it provides the backdrop to everything that happens.”

Mass is therefore generated as a result of an interaction between the ping-pong balls (i.e., atomic particles) and the maple syrup (i.e, the Higgs Field).  However, although the Higgs Field is pervasive, it is also variable and selective, since some particles are affected by the Higgs Field more than others, and photons pass through it unimpeded, thereby remaining mass-less particles.

 

Quality — Data Gets Higgy with It

Now that I have vastly oversimplified the Higgs Field, let me Get Higgy with It by attempting an analogy for data quality based on the Higgs Field.  As I do, please remember the wise words of Karen Lopez: “All analogies are perfectly imperfect.”

Quality provides the backdrop to everything that happens when we use data.  Data in the wild, independent from use, is as carefree as the mass-less photon whizzing around at the speed of light, like a ping-pong ball bouncing along without a trace of maple syrup on it.  But once we interact with data using our sticky-maple-syrup-covered fingers, data begins to slow down, begins to feel the effects of our use.  We give data mass so that it can become the basic building blocks of what matters to us.

Some data is affected more by our use than others.  The more subjective our use, the more we weigh data down.  The more objective our use, the less we weigh data down.  Sometimes, we drag data down deep into the maple syrup, covering data up with an application layer, or bottling data into silos.  Other times, we keep data in the shallow end of the molasses swimming pool.

Quality is the Higgs Field of Data.  As users of data, we are the Higgs Bosons — we are the fundamental particles associated with a ubiquitous data quality field.  By using data, we give data its quality.  The quality of data can not be separated from its use any more than the particles of the universe can be separated from the Higgs Field.

The closest data equivalent of a photon, a ping-pong ball particle that doesn’t get stuck in the maple syrup of the Higgs Field, is Open Data, which doesn’t get stuck within silos, but is instead data freely shared without the sticky quality residue of our use.

 

Related Posts

Our Increasingly Data-Constructed World

What is Weighing Down your Data?

Data Myopia and Business Relativity

Redefining Data Quality

Are Applications the La Brea Tar Pits for Data?

Swimming in Big Data

Sometimes it’s Okay to be Shallow

Data Quality and Big Data

Data Quality and the Q Test

My Own Private Data

No Datum is an Island of Serendip

Sharing Data

Shining a Social Light on Data Quality

Last week, when I published my blog post Lightning Strikes the Cloud, I unintentionally demonstrated three important things about data quality.

The first thing I demonstrated was even an obsessive-compulsive data quality geek is capable of data defects, since I initially published the post with the title Lightening Strikes the Cloud, which is an excellent example of the difference between validity and accuracy caused by the Cupertino Effect, since although lightening is valid (i.e., a correctly spelled word), it isn’t contextually accurate.

The second thing I demonstrated was the value of shining a social light on data quality — the value of using collaborative tools like social media to crowd-source data quality improvements.  Thankfully, Julian Schwarzenbach quickly noticed my error on Twitter.  “Did you mean lightning?  The concept of lightening clouds could be worth exploring further,” Julian humorously tweeted.  “Might be interesting to consider what happens if the cloud gets so light that it floats away.”  To which I replied that if the cloud gets so light that it floats away, it could become Interstellar Computing or, as Julian suggested, the start of the Intergalactic Net, which I suppose is where we will eventually have to store all of that big data we keep hearing so much about these days.

The third thing I demonstrated was the potential dark side of data cleansing, since the only remaining trace of my data defect is a broken URL.  This is an example of not providing a well-documented audit trail, which is necessary within an organization to communicate data quality issues and resolutions.

Communication and collaboration are essential to finding our way with data quality.  And social media can help us by providing more immediate and expanded access to our collective knowledge, experience, and wisdom, and by shining a social light that illuminates the shadows cast upon data quality issues when a perception filter or bystander effect gets the better of our individual attention or undermines our collective best intentions — which, as I recently demonstrated, occasionally happens to all of us.

 

Related Posts

Data Quality and the Cupertino Effect

Are you turning Ugly Data into Cute Information?

The Importance of Envelopes

The Algebra of Collaboration

Finding Data Quality

The Wisdom of the Social Media Crowd

Perception Filters and Data Quality

Data Quality and the Bystander Effect

The Family Circus and Data Quality

Data Quality and the Q Test

Metadata, Data Quality, and the Stroop Test

The Three Most Important Letters in Data Governance

Saving Private Data

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

This episode is an edited rebroadcast of a segment from the OCDQ Radio 2011 Year in Review, during which Daragh O Brien and I discuss the data privacy and data protection implications of social media, cloud computing, and big data.

Daragh O Brien is one of Ireland’s leading Information Quality and Governance practitioners.  After being born at a young age, Daragh has amassed a wealth of experience in quality information driven business change, from CRM Single View of Customer to Regulatory Compliance, to Governance and the taming of information assets to benefit the bottom line, manage risk, and ensure customer satisfaction.  Daragh O Brien is the Managing Director of Castlebridge Associates, one of Ireland’s leading consulting and training companies in the information quality and information governance space.

Daragh O Brien is a founding member and former Director of Publicity for the IAIDQ, which he is still actively involved with.  He was a member of the team that helped develop the Information Quality Certified Professional (IQCP) certification and he recently became the first person in Ireland to achieve this prestigious certification.

In 2008, Daragh O Brien was awarded a Fellowship of the Irish Computer Society for his work in developing and promoting standards of professionalism in Information Management and Governance.

Daragh O Brien is a regular conference presenter, trainer, blogger, and author with two industry reports published by Ark Group, the most recent of which is The Data Strategy and Governance Toolkit.

You can also follow Daragh O Brien on Twitter and connect with Daragh O Brien on LinkedIn.

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

  • Social Media Strategy — Guest Crysta Anderson of IBM Initiate explains social media strategy and content marketing, including three recommended practices: (1) Listen intently, (2) Communicate succinctly, and (3) Have fun.

  • The Fall Back Recap Show — A look back at the Best of OCDQ Radio, including discussions about Data, Information, Business-IT Collaboration, Change Management, Big Analytics, Data Governance, and the Data Revolution.

The Family Circus and Data Quality

Family Circus.png

Like many young intellectuals, the only part of the Sunday newspaper I read growing up was the color comics section, and one of my favorite comic strips was The Family Circus created by cartoonist Bil Keane.  One of the recurring themes of the comic strip was a set of invisible gremlins that the children used to shift blame for any misdeeds, including Ida Know, Not Me, and Nobody.

Although I no longer read any section of the newspaper on any day of the week, this Sunday morning I have been contemplating how this same set of invisible gremlins is used by many people throughout most organizations to shift blame for any incidents when poor data quality negatively impacted business activities, especially since, when investigating the root cause, you often find that Ida Know owns the dataNot Me is accountable for data governance, and Nobody takes responsibility for data quality.

Data Quality and the Bystander Effect

In his recent Harvard Business Review blog post Break the Bad Data Habit, Tom Redman cautioned against correcting data quality issues without providing feedback to where the data originated.  “At a minimum,” Redman explained, “others using the erred data may not spot the error.  There is no telling where it might turn up or who might be victimized.”  And correcting bad data without providing feedback to its source also denies the organization an opportunity to get to the bottom of the problem.

“And failure to provide feedback,” Redman continued, “is but the proximate cause.  The deeper root issue is misplaced accountability — or failure to recognize that accountability for data is needed at all.  People and departments must continue to seek out and correct errors.  They must also provide feedback and communicate requirements to their data sources.”

In his blog post The Secret to an Effective Data Quality Feedback Loop, Dylan Jones responded to Redman’s blog post with some excellent insights regarding data quality feedback loops and how they can help improve your data quality initiatives.

I definitely agree with Redman and Jones about the need for feedback loops, but I have found, more often than not, that no feedback at all is provided on data quality issues because of the assumption that data quality is someone else’s responsibility.

This general lack of accountability for data quality issues is similar to what is known in psychology as the Bystander Effect, which refers to people often not offering assistance to the victim in an emergency situation when other people are present.  Apparently, the mere presence of other bystanders greatly decreases intervention, and the greater the number of bystanders, the less likely it is that any one of them will help.  Psychologists believe that the reason this happens is that as the number of bystanders increases, any given bystander is less likely to interpret the incident as a problem, and less likely to assume responsibility for taking action.

In my experience, the most common reason that data quality issues are often neither reported nor corrected is that most people throughout the enterprise act like data quality bystanders, making them less likely to interpret bad data as a problem or, at the very least, not their responsibility.  But the enterprise’s data quality is perhaps most negatively affected by this bystander effect, which may make it the worst bad data habit that the enterprise needs to break.

The Data Quality Placebo

Inspired by a recent Boing Boing blog post

Are you suffering from persistent and annoying data quality issues?  Or are you suffering from the persistence of data quality tool vendors and consultants annoying you with sales pitches about how you must be suffering from persistent data quality issues?

Either way, the Data Division of Prescott Pharmaceuticals (trusted makers of gastroflux, datamine, selectium, and qualitol) is proud to present the perfect solution to all of your real and/or imaginary data quality issues — The Data Quality Placebo.

Simply take two capsules (made with an easy-to-swallow coating) every morning and you will be guaranteed to experience:

“Zero Defects with Zero Side Effects” TM

(Legal Disclaimer: Zero Defects with Zero Side Effects may be the result of Zero Testing, which itself is probably just a side effect of The Prescott Promise: “We can promise you that we will never test any of our products on animals because . . . we never test any of our products.”)

How Data Cleansing Saves Lives

When it comes to data quality best practices, it’s often argued, and sometimes quite vehemently, that proactive defect prevention is far superior to reactive data cleansing.  Advocates of defect prevention sometimes admit that data cleansing is a necessary evil.  However, at least in my experience, most of the time they conveniently, and ironically, cleanse (i.e., drop) the word necessary.

Therefore, I thought I would share a story about how data cleansing saves lives, which I read about in the highly recommended book Space Chronicles: Facing the Ultimate Frontier by Neil deGrasse Tyson.  “Soon after the Hubble Space Telescope was launched in April 1990, NASA engineers realized that the telescope’s primary mirror—which gathers and reflects the light from celestial objects into its cameras and spectrographs—had been ground to an incorrect shape.  In other words, the two-billion dollar telescope was producing fuzzy images.  That was bad.  As if to make lemonade out of lemons, though, computer algorithms came to the rescue.  Investigators at the Space Telescope Science Institute in Baltimore, Maryland, developed a range of clever and innovative image-processing techniques to compensate for some of Hubble’s shortcomings.”

In other words, since it would be three years before Hubble’s faulty optics could be repaired during a 1993 space shuttle mission, data cleansing allowed astrophysicists to make good use of Hubble despite the bad data quality of its early images.

So, data cleansing algorithms saved Hubble’s fuzzy images — but how did this data cleansing actually save lives?

“Turns out,” Tyson explained, “maximizing the amount of information that could be extracted from a blurry astronomical image is technically identical to maximizing the amount of information that can be extracted from a mammogram.  Soon the new techniques came into common use for detecting early signs of breast cancer.”

“But that’s only part of the story.  In 1997, for Hubble’s second servicing mission, shuttle astronauts swapped in a brand-new, high-resolution digital detector—designed to the demanding specifications of astrophysicists whose careers are based on being able to see small, dim things in the cosmos.  That technology is now incorporated in a minimally invasive, low-cost system for doing breast biopsies, the next stage after mammograms in the early diagnosis of cancer.”

Even though defect prevention was eventually implemented to prevent data quality issues in Hubble’s images of outer space, those interim data cleansing algorithms are still being used today to help save countless human lives here on Earth.

So, at least in this particular instance, we have to admit that data cleansing is a necessary good.

Data Quality and the Q Test

In psychology, there’s something known as the Q Test, which asks you to use one of your fingers to trace an upper case letter Q on your forehead.  Before reading this blog post any further, please stop and perform the Q Test on your forehead right now.

 

Essentially, there’s only two ways you can complete the Q Test, which are differentiated by how you trace the tail of the Q.  Most people start by tracing a letter O, and then complete the Q by tracing its tail either toward their right eye or toward their left eye.

If you trace the tail of the Q toward your right eye, you’re imagining what a letter Q would look like from your perspective.  But if you trace the tail of the Q toward your left eye, you’re imagining what it would look like from the perspective of another person.

Basically, the point of the Q Test is to determine whether or not you have a natural tendency to consider the perspective of others.

Although considering the perspective of others is a positive under different circumstances, if you traced the letter Q with its tail toward your left eye, psychologists say that you failed the Q Test since it reveals a negative — you’re a good liar.  The reason why is that you have to be good at considering the perspective of others in order to be good at deceiving them with a believable lie.

So, as I now consider your perspective, dear reader, I bet you’re wondering: What does the Q Test have to do with data quality?

Like truth, beauty, and art, data quality can be said to be in the eyes of the beholder, or when data quality is defined, as it most often is, as fitness for the purpose of use — the eyes of the user.  But since most data has both multiple uses and users, data fit for the purpose of one use or user may not be fit for the purpose of other uses and users.  However, these multiple perspectives are considered irrelevant from the perspective of an individual user, who just needs quality data fit for the purpose of their own use.

The good news is that when it comes to data quality, most of us pass the Q Test, which means we’re not good liars.  The bad news is that since most of us pass the Q Test, we’re often only concerned about our own perspective about data quality, which is why so many organizations struggle to define data quality standards.

At the next discussion about your organization’s data quality standards, try inviting the participants to perform the Q Test.

 

Related Posts

The Point of View Paradox

You Say Potato and I Say Tater Tot

Data Myopia and Business Relativity

Beyond a “Single Version of the Truth”

DQ-BE: Single Version of the Time

Data and the Liar’s Paradox

The Fourth Law of Data Quality

Plato’s Data

Once Upon a Time in the Data

The Idea of Order in Data

Hell is other people’s data

Song of My Data

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Redefining Data Quality — Guest Peter Perera discusses his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.
  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.