How Data Cleansing Saves Lives
When it comes to data quality best practices, it’s often argued, and sometimes quite vehemently, that proactive defect prevention is far superior to reactive data cleansing. Advocates of defect prevention sometimes admit that data cleansing is a necessary evil. However, at least in my experience, most of the time they conveniently, and ironically, cleanse (i.e., drop) the word necessary.
Therefore, I thought I would share a story about how data cleansing saves lives, which I read about in the highly recommended book Space Chronicles: Facing the Ultimate Frontier by Neil deGrasse Tyson. “Soon after the Hubble Space Telescope was launched in April 1990, NASA engineers realized that the telescope’s primary mirror—which gathers and reflects the light from celestial objects into its cameras and spectrographs—had been ground to an incorrect shape. In other words, the two-billion dollar telescope was producing fuzzy images. That was bad. As if to make lemonade out of lemons, though, computer algorithms came to the rescue. Investigators at the Space Telescope Science Institute in Baltimore, Maryland, developed a range of clever and innovative image-processing techniques to compensate for some of Hubble’s shortcomings.”
In other words, since it would be three years before Hubble’s faulty optics could be repaired during a 1993 space shuttle mission, data cleansing allowed astrophysicists to make good use of Hubble despite the bad data quality of its early images.
So, data cleansing algorithms saved Hubble’s fuzzy images — but how did this data cleansing actually save lives?
“Turns out,” Tyson explained, “maximizing the amount of information that could be extracted from a blurry astronomical image is technically identical to maximizing the amount of information that can be extracted from a mammogram. Soon the new techniques came into common use for detecting early signs of breast cancer.”
“But that’s only part of the story. In 1997, for Hubble’s second servicing mission, shuttle astronauts swapped in a brand-new, high-resolution digital detector—designed to the demanding specifications of astrophysicists whose careers are based on being able to see small, dim things in the cosmos. That technology is now incorporated in a minimally invasive, low-cost system for doing breast biopsies, the next stage after mammograms in the early diagnosis of cancer.”
Even though defect prevention was eventually implemented to prevent data quality issues in Hubble’s images of outer space, those interim data cleansing algorithms are still being used today to help save countless human lives here on Earth.
So, at least in this particular instance, we have to admit that data cleansing is a necessary good.
Related Posts
Hyperactive Data Quality (Second Edition)
What going to the dentist taught me about data quality
Paleolithic Rhythm and Data Quality
The Dichotomy Paradox, Data Quality and Zero Defects
Data Quality and The Middle Way
There is No Such Thing as a Root Cause



Jim Harris
Reader Comments (6)
I followed Hubble's troubles way back when - and have really enjoyed its pictures. Now I know I have it to thank - in part - for my mom having her cancer detected early and beating it.
Thanks, Jim.
I love stories like that and there are many of them. A multi-million dollar space project is, in part, justified because it has produced real benefits in the treatment of cancer. Wouldn't it be nice though if there was a multi-billion dollar project in the research of cancer that had benefits for space exploration? Better still - a multi-billion dollar Data Quality project that had benefits in space exploration and the treatment of cancer. If you think about it, the latter has more chance of success.
Got to go now, I've just spotted a postcode that needs rectifying :-(
Thanks for your comments, Bryan and Phil.
@Bryan — Thanks for sharing the cancer story about your mom, I am very happy to hear that she beat it! :-)
@Phil — You raise, as usual, a very good point. Neil deGrasse Tyson provided his somewhat related perspective in the same section of his book that I quoted in my blog post:
“So why not ask investigators to take direct aim at the challenge of detecting breast cancer? Why should innovations in medicine have to wait for a Hubble-size blunder in space? My answer may not be politically correct, but it’s the truth: when you organize extraordinary missions, you attract people of extraordinary talent who might not have been inspired by or attracted to the goal of saving the world from cancer or hunger or pestilence.”
Your mission, should you choose to accept it, is to find and recruit - "people of extraordinary talent who are inspired by or attracted to" - the goal of Data Quality.
Easy. Go down your mailing list. ;-)
Oh, and they must have access to a Hubble-sized budget.
Whilst we didn't save lives, we certainly made a difference to over 4,000 victims and their loved ones in but a few hours.
In the aftermath of Hurricane Katrina, Microsoft put out a call for help. We (DQ Global) volunteered our match software which was used within hours to repatriate victims separated from their loved ones by the hurricane.
It was used to cross match victims recorded in databases relating to each of the victim shelters, to a register of missing persons populated by their friends and relatives.
Overnight over 4,000 victims were repatriated. We were, of course, happy and very proud, however, I suspect not as happy as those who were relieved of the anguish, stress, and fear, and were re-united.
A happy ending in another human story relating to data quality and advanced data matching.
Thanks for sharing this story!
In the data management field, handling a "Customer" list may look a little bit "unsexy and petty".
Replace the word customer with "Patient" or "Victim", and it adds a lot of sense and human value to your daily work!