Availability Bias and Data Quality Improvement

The availability heuristic is a mental shortcut that occurs when people make judgments based on the ease with which examples come to mind.  Although this heuristic can be beneficial, such as when it helps us recall examples of a dangerous activity to avoid, sometimes it leads to availability bias, where we’re affected more strongly by the ease of retrieval than by the content retrieved.

In his thought-provoking book Thinking, Fast and Slow, Daniel Kahneman explained how availability bias works by recounting an experiment where different groups of college students were asked to rate a course they had taken the previous semester by listing ways to improve the course — while varying the number of improvements that different groups were required to list.

Counterintuitively, students in the group required to list more necessary improvements gave the course a higher rating, whereas students in the group required to list fewer necessary improvements gave the course a lower rating.

According to Kahneman, the extra cognitive effort expended by the students required to list more improvements biased them into believing it was difficult to list necessary improvements, leading them to conclude that the course didn’t need much improvement, and conversely, the little cognitive effort expended by the students required to list few improvements biased them into concluding, since it was so easy to list necessary improvements, that the course obviously needed improvement.

This is counterintuitive because you’d think that the students would rate the course based on an assessment of the information retrieved from their memory regardless of how easy that information was to retrieve.  It would have made more sense for the course to be rated higher for needing fewer improvements, but availability bias lead the students to the opposite conclusion.

Availability bias can also affect an organization’s discussions about the need for data quality improvement.

If you asked stakeholders to rate the organization’s data quality by listing business-impacting incidents of poor data quality, would they reach a different conclusion if you asked them to list one incident versus asking them to list at least ten incidents?

In my experience, an event where poor data quality negatively impacted the organization, such as a regulatory compliance failure, is often easily dismissed by stakeholders as an isolated incident to be corrected by a one-time data cleansing project.

But would forcing stakeholders to list ten business-impacting incidents of poor data quality make them concede that data quality improvement should be supported by an ongoing program?  Or would the extra cognitive effort bias them into concluding, since it was so difficult to list ten incidents, that the organization’s data quality doesn’t really need much improvement?

I think that the availability heuristic helps explain why most organizations easily approve reactive data cleansing projects, and availability bias helps explain why most organizations usually resist proactively initiating a data quality improvement program.


Related Posts

DQ-View: The Five Stages of Data Quality

Data Quality: Quo Vadimus?

Data Quality and Chicken Little Syndrome

The Data Quality Wager

You only get a Return from something you actually Invest in

“Some is not a number and soon is not a time”

Why isn’t our data quality worse?

Data Quality and the Bystander Effect

Data Quality and the Q Test

Perception Filters and Data Quality

Predictably Poor Data Quality



Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Redefining Data Quality — Guest Peter Perera discusses his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.