International Data Quality
Jim Harris in
Data Quality,
OCDQ Radio,
Podcasts tagged
Philosophy
Thursday, August 11, 2011 at 3:00AM OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.
On this episode of OCDQ Radio, I discuss the sometimes mysterious world of international name and address data quality, which is why I am pleased to be joined by, not an international man of mystery, but instead, an international man of data quality.
Graham Rhind is an acknowledged expert in the field of data quality. Graham runs GRC Database Information, a consultancy company based in The Netherlands, where he researches postal code and addressing systems, collates international data, runs a busy postal link website, writes data management software, and maintains an online Data Quality Glossary.
Graham Rhind speaks regularly on the subject and is the author of four books on the topic of international data management, including The Global Source Book for Name and Address Data Management, which has been an invaluable resource for me.
On this episode of OCDQ Radio, Graham Rhind and I discusses the international challenges of postal address and person name data quality, including its implications for web forms and other data entry interfaces.

International Data Quality
Additional listening options:
Related Posts
Adventures in Data Profiling (Part 4) - State Abbreviation, Zip Code, Country Code
Adventures in Data Profiling (Part 5) - Postal Address
Adventures in Data Profiling (Part 7) - Customer Name
Data Quality and the Cupertino Effect
DQ-Tip: “There is no such thing as data accuracy...”
Data Quality Practices—Activate!



Reader Comments (2)
From the LinkedIn Group for DAMA International, Merry Law commented:
“Graham is absolutely correct: knowledge and understanding of addresses and addressing issues are extremely important to good data quality. This applies to both gathering and storing the information.
There are approximately 30 basic templates for address worldwide. The basic U.S. address template is shared by only 6 other countries, not counting those such as the Marshall Islands that receive mail via the U.S. Postal Service. One programmer tells me that he counts more than 100 variations in addressing formats among the countries were his company regularly mails when he considers spacing and puncuation variations.
In one study of international databases maintained in the U.S., the average number of lines in a valid address was 5.9 and the average character per line was 14.8. However, the average for the maximum number of lines was 14.8 and the average for maximum characters was 54.
My two favorite examples for lengthy lines:
(1) Escherheimerlandstrasse in Frankfurt, Germany is abbreviated Escherheimerlandstr
(2) The city of Thiruvananthapuram in India
Knowledge and information will improve data -- and reduce costs.”
And Henrik Liliendahl Sørensen responded:
“It’s kind of funny that for a guy like me used to looking at German address data, the example is straightforward. The long name is an example of compound words in many Germanic languages. Escherheimerlandstrasse can be broken down to something like land-street-of-Escher-home and abbreviating strasse (like street) to str. is very common in Germany.”
Thanks Jim for including my LinkedIn response here. Indeed we are all having a domestic view of the world. There is a famous poster called The New Yorker. This poster perfectly illustrates the centricity we often have about the town, region or country we live in. I made a blog post based on it (after I corrected an English spelling mistake) called Foreign Affairs.
P.S. Thanks for correcting my term “straight forward” on “Linked In” to straightforward.