This is a screen capture of the results of last month’s unscientific poll about proactive data quality versus reactive data quality alongside one of my favorite (this is the third post I’ve used it in) graphics of the Wonder Twins (Zan and Jayna) with Gleek.
Although reactive (15 combined votes) easily defeated proactive (6 combined votes) in the poll, proactive versus reactive is one debate that will likely never end. However, the debate makes it seem as if we are forced to choose one approach over the other.
Generally speaking, most recommended data quality practices advocate implementing proactive defect prevention and avoiding reactive data cleansing. But as Graham Rhind commented, data quality is neither exclusively proactive nor exclusively reactive.
“And if you need proof, start looking at the data,” Graham explained. “For example, gender. To produce quality data, a gender must be collected and assigned proactively, i.e., at the data collection stage. Gender coding reactively on the basis of, for example, name, only works correctly and with certainty in a certain percentage of cases (that percentage always being less than 100). Reactive data quality in that case can never be the best practice because it can never produce the best data quality, and, depending on what you do with your data, can be very damaging.”
“On the other hand,” Graham continued, “the real world to which the data is referring changes. People move, change names, grow old, die. Postal code systems and telephone number systems change. Place names change, countries come and go. In all of those cases, a reactive process is the one that will improve data quality.”
“Data quality is a continuous process,” Graham concluded. From his perspective, a realistic data quality practice advocates being “proactive as much as possible, and reactive to keep up with a dynamic world. Works for me, and has done well for decades.”
I agree with Graham because, just like any complex problem, data quality has no fast and easy solution. In my experience, a hybrid discipline is always required, combining proactive and reactive approaches into one continuous data quality practice.
Or as Zan (representing Proactive) and Jayna (representing Reactive) would say: “Data Quality Practices—Activate!”
And as Gleek would remind us: “The best data quality practices remain continuously active.”