March 25, 2010

Enterprise Data World 2010

March 25, 2010/ Jim Harris

Enterprise Data World 2010 was held March 14-18 in San Francisco, California at the Hilton San Francisco Union Square.

Congratulations and thanks to Tony Shaw, Maya Stosskopf, the entire Wilshire Conferences staff, as well as Cathy Nolan and everyone with DAMA International, for their outstanding efforts on delivering yet another wonderful conference experience.

I wish I could have attended every session on the agenda, but this blog post provides some quotes from a few of my favorites.

Applying Agile Software Engineering Principles to Data Governance

Conference session by Marty Moseley, CTO of Initiate Systems, an IBM company.

Quotes from the session:

“Data governance is 80% people and only 20% technology”
“Data governance is an ongoing, evolutionary practice”
“There are some organizational problems that are directly caused by poor data quality”
“Build iterative 'good enough' solutions – not 'solve world hunger' efforts”
“Traditional approaches to data governance try to 'boil the ocean' and solve every data problem”
“Agile approaches to data governance laser focus on iteratively solving one problem at a time”
“Quality is everything, don't sacrifice accuracy for performance, you can definitely have both”

Seven iterative steps of Agile Data Governance:

“Form the Data Governance Board – Small guidance team of executives who can think cross-organizationally”
“Define the Problem and the Team – Root cause analysis, build the business case, appoint necessary resources”
“Nail Down Size and Scope – Prioritize the scope in order to implement the current iteration in less than 9 months”
“Validate Your Assumptions – Challenge all estimates, perform data profiling, list data quality issues to resolve”
“Establishing Data Policies – Measurable statements of 'what must be achieved' for which kinds of data”
“Implement the data quality solution for the current iteration”
“Evaluate the overall progress and plan for the next iteration”

Monitor the Quality of your Master Data

Conference session by Thomas Ravn, MDM Practice Director at Platon.

Quotes from the session:

“Ensure master data is taken into account each and every time a business process or IT system is changed”
“Web forms requiring master data attributes can NOT be based on a single country's specific standards”
“There is no point in monitoring data quality if no one within the business feels responsible for it”
“The greater the business impact of a data quality dimension, the more difficult it is to measure”
“Data quality key performance indicators (KPI) should be tied directly to business processes”
“Implement a data input validation rule rather than allow bad data to be entered”
“Sometimes the business logic is too ambiguous to be enforced by a single data input validation rule”
“Data is not always clean or dirty in itself – it depends on the viewpoint or defined standard”
“Data quality is in the eye of the beholder”

Measuring the Business Impact of Data Governance

Conference session by Tony Fisher, CEO of DataFlux, and Dr. Walid el Abed, CEO of Global Data Excellence.

Quotes from the session:

“The goal of data governance is to position the business to improve”
“Revenue optimization, cost control, and risk mitigation are the business drivers of data management”
“You don't manage data to manage data, you manage data to improve your business”
“Business rules are rules that data should comply with in order to have the process execute properly”
“For every business rule, define the main impact (cost of failure) and the business value (result of success)”
“Power Shift – Before: Having information is power – Now: Sharing information is power”
“You must translate technical details into business language, such as cost, revenue, risk”
“Combine near-term fast to value with long-term alignment with business strategy”
“Data excellence must be a business value added driven program”
“Communication is key to data excellence, make it visible and understood by all levels of the organization”

The Effect of the Financial Meltdown on Data Management

Conference session by April Reeve, Consultant at EMC Consulting.

Quotes from the session:

“The recent financial crisis has greatly increased the interest in both data governance and data transparency”
“Data Governance is a symbiotic relationship of Business Governance and Technology Governance”
“Risk management is a data problem in the forefront of corporate concern – now viewing data as a corporate asset”
“Data transparency increases the criticality of data quality – especially regarding the accuracy of financial reporting”

What the Business Wants

Closing Keynote Address by Graeme Simsion, Principal at Simsion & Associates.

Quotes from the keynote:

“You can get a lot done if you don't care who gets the credit”
“People will work incredibly hard to implement their own ideas”
“What if we trust the business to know what's best for the business?”
“Let's tell the business what we (as data professionals) do – and then ask the business what they want”

Social Karma

My Badge for Enterprise Data World 2010

I presented this session about the art of effectively using social media in business.

An effective social media strategy is essential for organizations as well as individual professionals. Using social media effectively can definitely help promote you, your expertise, your company, and its products and services. However, too many businesses and professionals have a selfish social media strategy. You should not use social media to exclusively promote only yourself or your business.

You need to view social media as Social Karma.

For free related content with no registration required, click on this link: Social Karma

Live-Tweeting at Enterprise Data World 2010

Twitter at Enterprise Data World 2010

The term “live-tweeting” describes using Twitter to provide near real-time reporting from an event. When a conference schedule has multiple simultaneous sessions, Twitter is great for sharing insights from the sessions you are in with other conference attendees at other sessions, as well as with the on-line community not attending the conference.

Enterprise Data World 2010 had a great group of tweeps (i.e., people using Twitter) and I want to thank all of them, and especially the following Super-Tweeps in particular:

Karen Lopez – @datachick

April Reeve – @Datagrrl

Corinna Martinez – @Futureratti

Eva Smith – @datadeva

Alec Sharp – @alecsharp

Ted Louie – @tedlouie

Rob Drysdale – @projmgr

Loretta Mahon Smith – @silverdata

Additional Resources

Official Website for DAMA International

LinkedIn Group for DAMA International

Twitter Account for DAMA International

Facebook Group for DAMA International

Official Website for Enterprise Data World 2010

LinkedIn Group for Enterprise Data World

Twitter Account for Enterprise Data World

Facebook Group for Enterprise Data World

Enterprise Data World 2011 will take place in Chicago, Illinois at the Chicago Sheraton and Towers on April 3-7, 2011.

Enterprise Data World 2009

TDWI World Conference Chicago 2009

DataFlux IDEAS 2009

January 11, 2010

Social Karma (Part 1)

January 11, 2010/ Jim Harris

An effective social media strategy is essential for organizations as well as individual professionals.

Using social media effectively, including blogging and social networking sites (e.g., Twitter, Facebook, LinkedIn), can definitely help promote you, your expertise, your company, and its products and services.

However, it is sad—but true—that too many people and companies have a selfish social media strategy.

You should not use social media to exclusively promote only yourself or your business.

You need to view social media as Social Karma.

If you can focus your social media and social networking efforts on helping others, then you will get much more back than just a blog reader, a LinkedIn connection, a Facebook friend, a Twitter follower, or even a potential customer.

I am not a Social Media Expert—but I play one on the Internet

I am not a social media “expert.” In fact, until late 2008, I wasn't even interested enough to ask people what they meant when I heard them talking about “social media.” I started blogging, tweeting, and using other social media in early 2009.

Please let me do the complex math for you—I still have less than one year of actual experience with social media.

I don't know how you define expertise—and I do acknowledge the inherent difficulty in vetting expertise in such a new and rapidly evolving field—but less than one year of experience with anything does not an expert make, in my humble opinion.

However, I have spent over 15 years in computer science and information technology related disciplines, as a software engineer, consultant, and instructor. I have considerable experience and expertise applying technology in a business context in order to implement solutions for Global 500 companies in a wide variety of industries.

Therefore, I am not a complete moron—but I will leave it to you to determine the actual percentage.

I am currently a full-time writer making all of my income from social media—mainly from blogging and mostly from ghostwriting for corporate blogs.

I am not trying to sell you anything.

I am going to freely share what I have learned so far, including what I have learned from people with far more experience using social media. As I stated previously, I hesitate to call anyone an expert in such a rapidly evolving discipline, but I will mention several resources I have found helpful.

I have absolutely no affiliation or any paid relationship with any person, website, event, product, or book that I recommend.

About This Series

The primary reason that I am organizing my thoughts about social media involves my preparation for an upcoming conference presentation about using social media effectively for business purposes (more details in the next section).

I am publishing this content as a series on my blog, not only to provide supporting material for the small group of people that actually attend my conference session, but also because I have learned firsthand how the two-way conversation that blogging provides via comments from my readers, greatly improves the quality of my material.

Throughout this series, I will combine traditional blog posts with presentation slides, podcasts, and videos, in order to build a multimedia library of supporting material—all freely available, no registration required.

Enterprise Data World 2010

Enterprise Data World is the business world’s most comprehensive vendor-neutral educational event about data and information management. This year’s program will be bigger than ever before, with more sessions, more case studies, and more can’t-miss content, providing over 200 hours of in-depth tutorials, hands-on workshops, practical sessions and insightful keynotes to take you to the forefront of your industry.

Enterprise Data World 2010 will be held March 14-18 in San Francisco, California at the Hilton San Francisco Union Square.

The full conference agenda can be viewed by clicking on this link: Enterprise Data World 2010 Conference Agenda.

The registration options can be viewed by clicking on this link: Enterprise Data World 2010 Conference Registration.

Use the discount code of EDW10SPKR for a $100 discount off your registration fees. (Discount code expires on February 26.)

On Monday, March 15 from 5:00 PM – 6:00 PM, I will be presenting (30 minutes of material and 30 minutes of Q&A):

Social Karma: The Art of Effectively Using Social Media in Business

In Part 2 of this series: We will discuss leveraging social media for “listening purposes only” as a passive (and safe) way to determine what (if any) type of active involvement with social media makes sense for you and/or your company.

Social Karma (Part 2) – Social Media Preparation

Social Karma (Part 3) – Listening Stations, Home Base, and Outposts

Social Karma (Part 4) – Blogging Best Practices

Social Karma (Part 5) – Connection, Engagement, and ROI Basics

Social Karma (Part 6) – Social Media Books

Social Karma (Part 7) – Twitter

November 07, 2009

The Once and Future Data Quality Expert

November 07, 2009/ Jim Harris

Wednesday, November 11 is World Quality Day 2009.

World Quality Day was established by the United Nations in 1990 as a focal point for the quality management profession and as a celebration of the contribution that quality makes to the growth and prosperity of nations and organizations. The goal of World Quality Day is to raise awareness of how quality approaches (including data quality best practices) can have a tangible effect on business success, as well as contribute towards world-wide economic prosperity.

IAIDQ

The International Association for Information and Data Quality (IAIDQ) was chartered in January 2004 and is a not-for-profit, vendor-neutral professional association whose purpose is to create a world-wide community of people who desire to reduce the high costs of low quality information and data by applying sound quality management principles to the processes that create, maintain and deliver data and information.

Since 2007 the IAIDQ has celebrated World Quality Day as a springboard for improvement and a celebration of successes. Please join us to celebrate World Quality Day by participating in our interactive webinar in which the Board of Directors of the IAIDQ will share with you stories and experiences to promote data quality improvements within your organization.

In my recent Data Quality Pro article The Future of Information and Data Quality, I reported on the IAIDQ Ask The Expert Webinar with co-founders Larry English and Tom Redman, two of the industry pioneers for data quality and two of the most well-known data quality experts.

Data Quality Expert

As World Quality Day 2009 approaches, my personal reflections are focused on what the title data quality expert has meant in the past, what it means today, and most important, what it will mean in the future.

With over 15 years of professional services and application development experience, I consider myself to be a data quality expert. However, my experience is paltry by comparison to English, Redman, and other industry luminaries such as David Loshin, to use one additional example from many.

Experience is popularly believed to be the path that separates knowledge from wisdom, which is usually accepted as another way of defining expertise.

Oscar Wilde once wrote that “experience is simply the name we give our mistakes.” I agree. I have found that the sooner I can recognize my mistakes, the sooner I can learn from the lessons they provide, and hopefully prevent myself from making the same mistakes again.

The key is early detection. As I gain experience, I gain an improved ability to more quickly recognize my mistakes and thereby expedite the learning process.

James Joyce wrote that “mistakes are the portals of discovery” and T.S. Eliot wrote that “we must not cease from exploration and the end of all our exploring will be to arrive where we began and to know the place for the first time.”

What I find in the wisdom of these sages is the need to acknowledge the favor our faults do for us. Therefore, although experience is the path that separates knowledge from wisdom, the true wisdom of experience is the wisdom of failure.

As Jonah Lehrer explained: “Becoming an expert just takes time and practice. Once you have developed expertise in a particular area, you have made the requisite mistakes.”

But expertise in any discipline is more than simply an accumulation of mistakes and birthdays. And expertise is not a static state that once achieved, allows you to simply rest on your laurels.

In addition to my real-world experience working on data quality initiatives for my clients, I also read all of the latest books, articles, whitepapers, and blogs, as well as attend as many conferences as possible.

The Times They Are a-Changin'

Much of the discussion that I have heard regarding the future of the data quality profession has been focused on the need for the increased maturity of both practitioners and organizations. Although I do not dispute this need, I am concerned about the apparent lack of attention being paid to how fast the world around us is changing.

Rapid advancements in technology, coupled with the meteoric rise of the Internet and social media (blogs, wikis, Twitter, Facebook, LinkedIn, etc.) has created an amazing medium that is enabling people separated by vast distances and disparate cultures to come together, communicate, and collaborate in ways few would have thought possible just a few decades ago.

I don't believe that it is an exaggeration to state that we are now living in an age where the contrast between the recent past and the near future is greater than perhaps it has ever been in human history. This brave new world has such people and technology in it, that practically every new day brings the possibility of another quantum leap forward.

Although it has been argued by some that the core principles of data quality management are timeless, I must express my doubt. The daunting challenges of dramatically increasing data volumes and the unrelenting progress of cloud computing, software as a service (SaaS), and mobile computing architectures, would appear to be racing toward a high-speed collision with our time-tested (but time-consuming to implement properly) data quality management principles.

The times they are indeed changing and I believe we must stop using terms like Six Sigma and Kaizen as if they were a shibboleth. If these or any other disciplines are to remain relevant, then we must honestly assess them in the harsh and unforgiving light of our brave new world that is seemingly changing faster than the speed of light.

Expertise is not static. Wisdom is not timeless. The only constant is change. For the data quality profession to truly mature, our guiding principles must change with the times, or be relegated to a past that is all too quickly becoming distant.

Share Your Perspectives

In celebration of World Quality Day, please share your perspectives regarding the past, present, and most important, the future of the data quality profession. With apologies to T. H. White, I declare this debate to be about the difference between:

The Once and Future Data Quality Expert

Mistake Driven Learning

The Fragility of Knowledge

The Wisdom of Failure

A Portrait of the Data Quality Expert as a Young Idiot

The Nine Circles of Data Quality Hell

Additional IAIDQ Links

IAIDQ Ask The Expert Webinar: World Quality Day 2009

IAIDQ Ask The Expert Webinar with Larry English and Tom Redman

INTERVIEW: Larry English - IAIDQ Co-Founder

INTERVIEW: Tom Redman - IAIDQ Co-Founder

IAIDQ Publications Portal

November 03, 2009

Customer Incognita

November 03, 2009/ Jim Harris

Many enterprise information initiatives are launched in order to unravel that riddle, wrapped in a mystery, inside an enigma, that great unknown, also known as...Customer.

Centuries ago, cartographers used the Latin phrase terra incognita (meaning “unknown land”) to mark regions on a map not yet fully explored. In this century, companies simply can not afford to use the phrase customer incognita to indicate what information about their existing (and prospective) customers they don't currently have or don't properly understand.

What is a Customer?

First things first, what exactly is a customer? Those happy people who give you money? Those angry people who yell at you on the phone or say really mean things about your company on Twitter and Facebook? Why do they have to be so mean?

Mean people suck. However, companies who don't understand their customers also suck. And surely you don't want to be one of those companies, do you? I didn't think so.

Getting back to the question, here are some insights from the Data Quality Pro discussion forum topic What is a customer?:

Someone who purchases products or services from you. The word “someone” is key because it’s not the role of a “customer” that forms the real problem, but the precision of the term “someone” that causes challenges when we try to link other and more specific roles to that “someone.” These other roles could be contract partner, payer, receiver, user, owner, etc.
Customer is a role assigned to a legal entity in a complete and precise picture of the real world. The role is established when the first purchase is accepted from this real-world entity. Of course, the main challenge is whether or not the company can establish and maintain a complete and precise picture of the real world.

These working definitions were provided by fellow blogger and data quality expert Henrik Liliendahl Sørensen, who recently posted 360° Business Partner View, which further examines the many different ways a real-world entity can be represented, including when, instead of a customer, the real-world entity represents a citizen, patient, member, etc.

A critical first step for your company is to develop your definition of a customer. Don't underestimate either the importance or the difficulty of this process. And don't assume it is simply a matter of semantics.

Some of my consulting clients have indignantly told me: “We don't need to define it, everyone in our company knows exactly what a customer is.” I usually respond: “I have no doubt that everyone in your company uses the word customer, however I will work for free if everyone defines the word customer in exactly the same way.” So far, I haven't had to work for free.

How Many Customers Do You Have?

You have done the due diligence and developed your definition of a customer. Excellent! Nice work. Your next challenge is determining how many customers you have. Hopefully, you are not going to try using any of these techniques:

SELECT COUNT(*) AS "We have this many customers" FROM Customers
SELECT COUNT(DISTINCT Name) AS "No wait, we really have this many customers" FROM Customers
Middle-Square or Blum Blum Shub methods (i.e. random number generation)
Magic 8-Ball says: “Ask again later”

One of the most common and challenging data quality problems is the identification of duplicate records, especially redundant representations of the same customer information within and across systems throughout the enterprise. The need for a solution to this specific problem is one of the primary reasons that companies invest in data quality software and services.

Earlier this year on Data Quality Pro, I published a five part series of articles on identifying duplicate customers, which focused on the methodology for defining your business rules and illustrated some of the common data matching challenges.

Topics covered in the series:

Why a symbiosis of technology and methodology is necessary when approaching this challenge
How performing a preliminary analysis on a representative sample of real data prepares effective examples for discussion
Why using a detailed, interrogative analysis of those examples is imperative for defining your business rules
How both false negatives and false positives illustrate the highly subjective nature of this problem
How to document your business rules for identifying duplicate customers
How to set realistic expectations about application development
How to foster a collaboration of the business and technical teams throughout the entire project
How to consolidate identified duplicates by creating a “best of breed” representative record

To read the series, please follow these links:

To download the associated presentation (no registration required), please follow this link: OCDQ Downloads

Conclusion

“Knowing the characteristics of your customers,” stated Jill Dyché and Evan Levy in the opening chapter of their excellent book, Customer Data Integration: Reaching a Single Version of the Truth, “who they are, where they are, how they interact with your company, and how to support them, can shape every aspect of your company's strategy and operations. In the information age, there are fewer excuses for ignorance.”

For companies of every size and within every industry, customer incognita is a crippling condition that must be replaced with customer cognizance in order for the company to continue to remain competitive in a rapidly changing marketplace.

Do you know your customers? If not, then they likely aren't your customers anymore.

August 26, 2009

The Only Thing Necessary for Poor Data Quality

August 26, 2009/ Jim Harris

“Demonstrate projected defects and business impacts if the business fails to act,” explains Dylan Jones of Data Quality Pro in his recent and remarkable post How To Deliver A Compelling Data Quality Business Case:

“Presenting a future without data quality management...leaves a simple take-away message – do nothing and the situation will deteriorate.”

I can not help but be reminded of the famous quote often attributed to the 18th century philosopher Edmund Burke:

“The only thing necessary for the triumph of evil, is for good men to do nothing.”

Or the even more famous quote often attributed to the long time ago Jedi Master Yoda:

“Poor data quality is the path to the dark side. Poor data quality leads to bad business decisions.

Bad business decisions leads to lost revenue. Lost revenue leads to suffering.”

When you present the business case for your data quality initiative to executive management and other corporate stakeholders, demonstrate that poor data quality is not a theoretical problem – it is a real business problem that negatively impacts the quality of decision-critical enterprise information.

Preventing poor data quality is mission-critical. Poor data quality will undermine the tactical and strategic initiatives essential to the enterprise's mission to survive and thrive in today's highly competitive and rapidly evolving marketplace.

“The only thing necessary for Poor Data Quality – is for good businesses to Do Nothing.”

Hyperactive Data Quality (Second Edition)

Data Quality: The Reality Show?

Data Governance and Data Quality

July 14, 2009

Data Quality Blogging All-Stars

July 14, 2009/ Jim Harris

The 2009 Major League Baseball (MLB) All-Star Game is being held tonight at Busch Stadium in St. Louis, Missouri.

For those readers who are not baseball fans, the All-Star Game is an annual exhibition held in mid-July that showcases the players with the best statistical performances from the first half of the MLB season.

As I watch the 80th Midsummer Classic, I offer this exhibition that showcases the bloggers with the posts I have most enjoyed reading from the first half of the 2009 data quality blogging season.

Dylan Jones

From Data Quality Pro:

How to transform your ETL tool into a data quality toolkit
DEBATE: How should data governance and data quality work together?
Selecting Data Quality Software (Two Part Series): Part 1, Part 2
Creating An Internal Data Quality Community (Four Part Series): Part 1, Part 2, Part 3, Part 4
15 Tips for transforming knowledge-workers into a data quality task force
10 Tips to help data quality professionals boost their career prospects in the downturn

Daragh O Brien

From The DOBlog:

Steve Sarsfield

From Data Governance and Data Quality Insider:

Daniel Gent

From Data Quality Edge:

Sun Tzu and the Art of Data Quality (Two Part Series): Part 1, Part 2
DQ is 1/3 Process Knowledge + 1/3 Business Knowledge + 1/3 Intuition
When Bad Data Becomes Acceptable Data
DQ Problems? Start a Data Quality Recognition Program!
Five Attributes for the Data Quality Analyst

Henrik Liliendahl Sørensen

From Liliendahl on Data Quality:

Stefanos Damianakis

From Netrics HD:

TSA False Negatives and the URoSD
TSA “Secure Flight” will require more demographic information
Narrative Fallacy and Data Matching
What’s in a Name? (Three Part Series): Part 1, Part 2, Part 3

Vish Agashe

From Business Intelligence: Process, People and Products:

Mark Goloboy

From Boston Data, Technology & Analytics:

Additional Resources

Over on Data Quality Pro, read the data quality blog roundups from the first half of 2009:

From the IAIDQ, read the 2009 issues of the IAIDQ Blog Carnival:

June 03, 2009

The Two Headed Monster of Data Matching

June 03, 2009/ Jim Harris

Data matching is commonly defined as the comparison of two or more records in order to evaluate if they correspond to the same real world entity (i.e. are duplicates) or represent some other data relationship (e.g. a family household).

Data matching is commonly plagued by what I refer to as The Two Headed Monster:

False Negatives - records that did not match, but should have been matched
False Positives - records that matched, but should not have been matched

I Fought The Two Headed Monster...

On a recent (mostly) business trip to Las Vegas, I scheduled a face-to-face meeting with a potential business partner that I had previously communicated with via phone and email only. We agreed to a dinner meeting at a restaurant in the hotel/casino where I was staying.

I would be meeting with the President/CEO and the Vice President of Business Development, a man and a woman respectively.

I was facing a real world data matching problem.

I knew their names, but I had no idea what they looked like. Checking their company website and LinkedIn profiles didn't help - no photos. I neglected to get their mobile phone numbers, however they had mine.

The restaurant was inside the casino and the only entrance was adjacent to a Starbucks that had tables and chairs facing the casino floor. I decided to arrive at the restaurant 15 minutes early and camp out at Starbucks since anyone going near the restaurant would have to walk right past me.

I was more concerned about avoiding false positives. I didn't want to walk up to every potential match and introduce myself since casino security would soon intervene (and I have seen enough movies to know that scene always ends badly).

I decided to apply some probabilistic data matching principles to evaluate the mass of humanity flowing past me.

If some of my matching criteria seems odd, please remember I was in a Las Vegas casino.

I excluded from consideration all:

Individuals wearing a uniform or a costume
Groups consisting of more than two people
Groups consisting of two men or two women
Couples carrying shopping bags or souvenirs
Couples demonstrating a public display of affection
Couples where one or both were noticeably intoxicated
Couples where one or both were scantily clad
Couples where one or both seemed too young or too old

I carefully considered any:

Couples dressed in business attire or business casual attire
Couples pausing to wait at the restaurant entrance
Couples arriving close to the scheduled meeting time

I was quite pleased with myself for applying probabilistic data matching principles to a real world situation.

However, the scheduled meeting time passed. At first, I simply assumed they might be running a little late or were delayed by traffic. As the minutes continued to pass, I started questioning my matching criteria.

...And The Two Headed Monster Won

When the clock reached 30 minutes past the scheduled meeting time, my mobile phone rang. My dinner companions were calling to ask if I was running late. They had arrived on time, were inside the restaurant, and had already ordered.

Confused, I entered the restaurant. Sure enough, there sat a man and a woman that had walked right past me. I excluded them from consideration because of how they were dressed. The Vice President of Business Development was dressed in jeans, sneakers and a casual shirt. The President/CEO was wearing shorts, sneakers and a casual shirt.

I had dismissed them as a vacationing couple.

I had been defeated by a false negative.

The Harsh Reality is that Monsters are Real

My data quality expertise could not guarantee victory in this particular battle with The Two Headed Monster.

Monsters are real and the hero of the story doesn't always win.

And it doesn’t matter if the match algorithms I use are deterministic, probabilistic, or even supercalifragilistic.

The harsh reality is that false negatives and false positives can be reduced, but never eliminated.

Are You Fighting The Two Headed Monster?

Are you more concerned about false negatives or false positives? Please share your battles with The Two Headed Monster.

Back in February and March, I published a five part series of articles on data matching methodology on Data Quality Pro.

Parts 2 and 3 of the series provided data examples to illustrate the challenge of false negatives and false positives within the context of identifying duplicate customers:

May 09, 2009

TDWI World Conference Chicago 2009

May 09, 2009/ Jim Harris

Founded in 1995, TDWI (The Data Warehousing Institute™) is the premier educational institute for business intelligence and data warehousing that provides education, training, certification, news, and research for executives and information technology professionals worldwide. TDWI conferences always offer a variety of full-day and half-day courses taught in an objective, vendor-neutral manner. The courses taught are designed for professionals and taught by in-the-trenches practitioners who are well known in the industry.

TDWI World Conference Chicago 2009 was held May 3-8 in Chicago, Illinois at the Hyatt Regency Hotel and was a tremendous success. I attended as a Data Quality Journalist for the International Association for Information and Data Quality (IAIDQ).

I used Twitter to provide live reporting from the conference. Here are my notes from the courses I attended:

BI from Both Sides: Aligning Business and IT

Jill Dyché, CBIP, is a partner and co-founder of Baseline Consulting, a management and technology consulting firm that provides data integration and business analytics services. Jill is responsible for delivering industry and client advisory services, is a frequent lecturer and writer on the business value of IT, and writes the excellent Inside the Biz blog. She is the author of acclaimed books on the business value of information: e-Data: Turning Data Into Information With Data Warehousing and The CRM Handbook: A Business Guide to Customer Relationship Management. Her latest book, written with Evan Levy, is Customer Data Integration: Reaching a Single Version of the Truth.

Course Quotes from Jill Dyché:

Five Critical Success Factors for Business Intelligence (BI):
1. Organization - Build organizational structures and skills to foster a sustainable program
2. Processes - Align both business and IT development processes that facilitate delivery of ongoing business value
3. Technology - Select and build technologies that deploy information cost-effectively
4. Strategy - Align information solutions to the company's strategic goals and objectives
5. Information - Treat data as an asset by separating data management from technology implementation
Three Different Requirement Categories:
1. What is the business need, pain, or problem? What business questions do we need to answer?
2. What data is necessary to answer those business questions?
3. How do we need to use the resulting information to answer those business questions?
“Data warehouses are used to make business decisions based on data – so data quality is critical”
“Even companies with mature enterprise data warehouses still have data silos - each business area has its own data mart”
“Instead of pushing a business intelligence tool, just try to get people to start using data”
“Deliver a usable system that is valuable to the business and not just a big box full of data”

TDWI Data Governance Summit

Philip Russom is the Senior Manager of Research and Services at TDWI, where he oversees many of TDWI’s research-oriented publications, services, and events. Prior to joining TDWI in 2005, he was an industry analyst covering BI at Forrester Research, as well as a contributing editor with Intelligent Enterprise and Information Management (formerly DM Review) magazines.

Summit Quotes from Philip Russom:

“Data Governance usually boils down to some form of control for data and its usage”
“Four Ps of Data Governance: People, Policies, Procedures, Process”
“Three Pillars of Data Governance: Compliance, Business Transformation, Business Integration”
“Two Foundations of Data Governance: Business Initiatives and Data Management Practices”
“Cross-functional collaboration is a requirement for successful Data Governance”

Becky Briggs, CBIP, CMQ/OE, is a Senior Manager and Data Steward for Airlines Reporting Corporation (ARC) and has 25 years of experience in data processing and IT - the last 9 in data warehousing and BI. She leads the program team responsible for product, project, and quality management, business line performance management, and data governance/stewardship.

Summit Quotes from Becky Briggs:

“Data Governance is the act of managing the organization's data assets in a way that promotes business value, integrity, usability, security and consistency across the company”
Five Steps of Data Governance:
1. Determine what data is required
2. Evaluate potential data sources (internal and external)
3. Perform data profiling and analysis on data sources
4. Data Services - Definition, modeling, mapping, quality, integration, monitoring
5. Data Stewardship - Classification, access requirements, archiving guidelines
“You must realize and accept that Data Governance is a program and not just a project”

Barbara Shelby is a Senior Software Engineer for IBM with over 25 years of experience holding positions of technical specialist, consultant, and line management. Her global management and leadership positions encompassed network authentication, authorization application development, corporate business systems data architecture, and database development.

Summit Quotes from Barbara Shelby:

Four Common Barriers to Data Governance:
1. Information - Existence of information silos and inconsistent data meanings
2. Organization - Lack of end-to-end data ownership and organization cultural challenges
3. Skill - Difficulty shifting resources from operational to transformational initiatives
4. Technology - Business data locked in large applications and slow deployment of new technology
Four Key Decision Making Bodies for Data Governance:
1. Enterprise Integration Team - Oversees the execution of CIO funded cross enterprise initiatives
2. Integrated Enterprise Assessment - Responsible for the success of transformational initiatives
3. Integrated Portfolio Management Team - Responsible for making ongoing business investment decisions
4. Unit Architecture Review - Responsible for the IT architecture compliance of business unit solutions

Lee Doss is a Senior IT Architect for IBM with over 25 years of information technology experience. He has a patent for process of aligning strategic capability for business transformation and he has held various positions including strategy, design, development, and customer support for IBM networking software products.

Summit Quotes from Lee Doss:

Five Data Governance Best Practices:
1. Create a sense of urgency that the organization can rally around
2. Start small, grow fast...pick a few visible areas to set an example
3. Sunset legacy systems (application, data, tools) as new ones are deployed
4. Recognize the importance of organization culture…this will make or break you
5. Always, always, always – Listen to your customers

Kevin Kramer is a Senior Vice President and Director of Enterprise Sales for UMB Bank and is responsible for development of sales strategy, sales tool development, and implementation of enterprise-wide sales initiatives.

Summit Quotes from Kevin Kramer:

“Without Data Governance, multiple sources of customer information can produce multiple versions of the truth”
“Data Governance helps break down organizational silos and shares customer data as an enterprise asset”
“Data Governance provides a roadmap that translates into best practices throughout the entire enterprise”

Kanon Cozad is a Senior Vice President and Director of Application Development for UMB Bank and is responsible for overall technical architecture strategy and oversees information integration activities.

Summit Quotes from Kanon Cozad:

“Data Governance identifies business process priorities and then translates them into enabling technology”
“Data Governance provides direction and Data Stewardship puts direction into action”
“Data Stewardship identifies and prioritizes applications and data for consolidation and improvement”

Summit Quotes from Jill Dyché:

“The hard part of Data Governance is the data”
“No data will be formally sanctioned unless it meets a business need”
“Data Governance focuses on policies and strategic alignment”
“Data Management focuses on translating defined polices into executable actions”
“Entrench Data Governance in the development environment”
“Everything is customer data – even product and financial data”

Data Quality Assessment - Practical Skills

Arkady Maydanchik is a co-founder of Data Quality Group, a recognized practitioner, author, and educator in the field of data quality and information integration. Arkady's data quality methodology and breakthrough ARKISTRA technology were used to provide services to numerous organizations. Arkady is the author of the excellent book Data Quality Assessment, a frequent speaker at various conferences and seminars, and a contributor to many journals and online publications. Data quality curriculum by Arkady Maydanchik can be found at eLearningCurve.

Course Quotes from Arkady Maydanchik:

“Nothing is worse for data quality than desperately trying to fix it during the last few weeks of an ETL project”
“Quality of data after conversion is in direct correlation with the amount of knowledge about actual data”
“Data profiling tools do not do data profiling - it is done by data analysts using data profiling tools”
“Data Profiling does not answer any questions - it helps us ask meaningful questions”
“Data quality is measured by its fitness to the purpose of use – it's essential to understand how data is used”
“When data has multiple uses, there must be data quality rules for each specific use”
“Effective root cause analysis requires not stopping after the answer to your first question - Keep asking: Why?”
“The central product of a Data Quality Assessment is the Data Quality Scorecard”
“Data quality scores must be both meaningful to a specific data use and be actionable”
“Data quality scores must estimate both the cost of bad data and the ROI of data quality initiatives”

Modern Data Quality Techniques in Action - A Demonstration Using Human Resources Data

Gian Di Loreto formed Loreto Services and Technologies in 2004 from the client services division of Arkidata Corporation. Loreto Services provides data cleansing and integration consulting services to Fortune 500 companies. Gian is a classically trained scientist - he received his PhD in elementary particle physics from Michigan State University.

Course Quotes from Gian Di Loreto:

“Data Quality is rich with theory and concepts – however it is not an academic exercise, it has real business impact”
“To do data quality well, you must walk away from the computer and go talk with the people using the data”
“Undertaking a data quality initiative demands developing a deeper knowledge of the data and the business”
“Some essential data quality rules are ‘hidden’ and can only be discovered by ‘clicking around’ in the data”
“Data quality projects are not about systems working together - they are about people working together”
“Sometimes, data quality can be ‘good enough’ for source systems but not when integrated with other systems”
“Unfortunately, no one seems to care about bad data until they have it”
“Data quality projects are only successful when you understand the problem before trying to solve it”

Mark Your Calendar

TDWI World Conference San Diego 2009 - August 2-7, 2009.

TDWI World Conference Orlando 2009 - November 1-6, 2009.

TDWI World Conference Las Vegas 2010 - February 21-26, 2010.

April 30, 2009

El Festival del IDQ Bloggers (April 2009)

April 30, 2009/ Jim Harris

Welcome to the April 2009 issue of El Festival del IDQ Bloggers, which is a blog carnival for information/data quality bloggers being run as part of the celebration of the five year anniversary of the International Association for Information and Data Quality (IAIDQ).

A blog carnival is a collection of posts from different blogs on a specific theme that are published across a series of issues. Anyone can submit a data quality blog post and experience the benefits of extra traffic, networking with other bloggers and discovering interesting posts. It doesn't matter what type of blog you have as long as the submitted post has a data quality theme.

El Festival del IDQ Bloggers will run monthly issues April through November 2009.

Can You Say Anything Interesting About Data Quality?

This simple question launched the first blog carnival of data quality that ran four issues from late 2007 through early 2008:

Blog Carnival of Data Quality (November 2007)

Blog Carnival of Data Quality (December 2007)

Blog Carnival of Data Quality (January 2008)

Blog Carnival of Data Quality (February 2008)

How to give your Data Warehouse a Data Quality Immunity System

Vincent McBurney is a manager for Deloitte consulting in Perth, Australia. His excellent blog Tooling Around in the IBM InfoSphere looks at the world of data integration software and occasionally wonders what IBM is up to. His data quality motto: "If it ain’t broke, don't fix it."

Vincent submitted How to give your Data Warehouse a Data Quality Immunity System that discusses how people who obsessively keep bad quality data out of a data warehouse may be making it unhealthy in the long run.

Stuck in First Gear

Michele Goetz is a free-lance consultant helping companies make sense of their business through better analysis, marketing best practices, and marketing solutions. Her excellent blog Intelligent Metrix guides you on your journey from data to metrics to insight to intelligent decisions. Her blog de-mystifies business intelligence and data management for the business, and helps you bridge the Business-IT gap for better processes and solutions that drive business success.

Michele submitted Stuck in First Gear that discusses the common problem when companies make big investments in enterprise class solutions but only use a portion of the capabilities, which is like driving a Porcshe in first gear.

When Bad Data Becomes Acceptable Data

Daniel Gent is a bilingual business analyst experienced with the System Development Life Cycle (SDLC), decision making, change management, database design, data modeling, data quality management, project coordination, and problem resolution. His excellent blog Data Quality Edge is a grassroots look at data quality for the data quality analyst in the trenches.

Daniel submitted When Bad Data Becomes Acceptable Data that discusses how you need to prioritize bad data and determine when it is acceptable to keep it for now.

Customer Value and Sustainable Quality

Daniel Bahula is a strategy and operations improvement professional with an extensive project experience from multinational telco, software development and professional services companies. His excellent blog DanBahula.net defies a simple definition and is a great example of how it doesn't matter what type of blog you have as long as the submitted post has a data quality theme.

Daniel submitted Customer Value and Sustainable Quality that discusses Six Sigma and its relevance to addressing data quality issues.

Data Quality, Entity Resolution, and OFAC Compliance

Bob Barker is the editor of Identity Resolution Daily, which is a corporate blog of Austin, TX-based Infoglide Software strongly dedicated to citizenship, integrity and communication. The blog has recently been gaining guest bloggers with varying points of view, helping it to become an excellent site for information, dialogue and community.

Bob submitted Data Quality, Entity Resolution, and OFAC Compliance that discusses how entity resolution is different from name matching and traditional data quality.

Selecting Data Quality Software

Dylan Jones is the editor of Data Quality Pro, which is the leading data quality online magazine and free independent community resource dedicated to helping data quality professionals take their career or business to the next level.

Dylan submitted Selecting Data Quality Software that discusses how to find the right data quality technology for your needs and your budget.

AmazonFail - A Classic Information Quality Impact

Since 2006, IQTrainwrecks.com, which is a community blog provided and administered by the International Association for Information and Data Quality (IAIDQ), has been serving up regular doses of information quality disasters from around the world.

IAIDQ submitted AmazonFail - A Classic Information Quality Impact that looks behind the hype and confusion surrounding the #amazonfail debacle.

You’re a Leader - Lead

Daragh O Brien is an Irish information quality expert, conference speaker, published author in the field, and director of publicity for the IAIDQ. His excellent blog The DOBlog, founded in 2006, was one of the first specialist information quality blogs.

Daragh submitted You’re a Leader - Lead that explains although there’s a whole lot of great management happening in the world, what we really need are information quality leaders.

All I Really Need To Know About Data Quality I Learned In Kindergarten

My name is Jim Harris. I am an independent consultant, speaker, writer and blogger with over 15 years of professional services and application development experience in data quality. My blog Obsessive-Compulsive Data Quality is an independent blog offering a vendor-neutral perspective on data quality.

I submitted All I Really Need To Know About Data Quality I Learned In Kindergarten that explains how show and tell, the five second rule and other great lessons from kindergarten are essential to success in data quality initiatives.

Submit to Daragh

The May issue will be edited by Daragh O Brien and hosted on The DOBlog.

For more information, please follow this link: El Festival del IDQ Bloggers

April 11, 2009

Enterprise Data World 2009

April 11, 2009/ Jim Harris

Formerly known as the DAMA International Symposium and Wilshire MetaData Conference, Enterprise Data World 2009 was held April 5-9 in Tampa, Florida at the Tampa Convention Center.

Enterprise Data World is the business world’s most comprehensive vendor-neutral educational event about data and information management. This year’s program was bigger than ever before, with more sessions, more case studies, and more can’t-miss content. With 200 hours of in-depth tutorials, hands-on workshops, practical sessions and insightful keynotes, the conference was a tremendous success. Congratulations and thanks to Tony Shaw, Maya Stosskopf and the entire Wilshire staff.

I attended Enterprise Data World 2009 as a member of the Iowa Chapter of DAMA and as a Data Quality Journalist for the International Association for Information and Data Quality (IAIDQ).

I used Twitter to provide live reporting from the sessions that I was attending.

I wish that I could have attended every session, but here are some highlights from ten of my favorites:

8 Ways Data is Changing Everything

Keynote by Stephen Baker from BusinessWeek.

His article Math Will Rock Your World inspired his excellent book The Numerati. Additionally, check out his blog: Blogspotting.

Quotes from the keynote:

"Data is changing how we understand ourselves and how we understand our world"
"Predictive data mining is about the mathematical modeling of humanity"
"Anthropologists are looking at social networking (e.g. Twitter, Facebook) to understand the science of friendship"

Master Data Management: Proven Architectures, Products and Best Practices

Tutorial by David Loshin from Knowledge Integrity.

Included material from his excellent book Master Data Management. Additionally, check out his blog: David Loshin.

Quotes from the tutorial:

"Master Data are the core business objects used in the different applications across the organization, along with their associated metadata, attributes, definitions, roles, connections and taxonomies"
"Master Data Management (MDM) provides a unified view of core data subject areas (e.g. Customers, Products)"
"With MDM, it is important not to over-invest and under-implement - invest in and implement only what you need"

Master Data Management: Ignore the Hype and Keep the Focus on Data

Case Study by Tony Fisher from DataFlux and Jeff Grayson from Equinox Fitness.

Quotes from the case study:

"The most important thing about Master Data Management (MDM) is improving business processes"
"80% of any enterprise implementation should be the testing phase"
"MDM Data Quality (DQ) Challenge: Any % wrong means you’re 100% certain you’re not always right"
"MDM DQ Solution: Re-design applications to ensure the ‘front-door’ protects data quality"
"Technology is critical, however thinking through the operational processes is more important"

A Case of Usage: Working with Use Cases on Data-Centric Projects

Case Study by Susan Burk from IBM.

Quotes from the case study:

"Use Case is a sequence of actions performed to yield a result of observable business value"
"The primary focus of data-centric projects is data structure, data delivery and data quality"
"Don’t like use cases? – ok, call them business acceptance criteria – because that’s what a use case is"

Crowdsourcing: People are Smart, When Computers are Not

Session by Sharon Chiarella from Amazon Web Services.

Quotes from the session:

"Crowdsourcing is outsourcing a task typically performed by employees to a general community of people"
"Crowdsourcing eliminates over-staffing, lowers costs and reduces work turnaround time"
"An excellent example of crowdsourcing is open source software development (e.g. Linux)"

Improving Information Quality using Lean Six Sigma Methodology

Session by Atul Borkar and Guillermo Rueda from Intel.

Quotes from the session:

"Information Quality requires a structured methodology in order to be successful"
Lean Six Sigma Framework: DMAIC – Define, Measure, Analyze, Improve, Control:
- Define = Describe the challenge, goal, process and customer requirements
- Measure = Gather data about the challenge and the process
- Analyze = Use hypothesis and data to find root causes
- Improve = Develop, implement and refine solutions
- Control = Plan for stability and measurement

Universal Data Quality: The Key to Deriving Business Value from Corporate Data

Session by Stefanos Damianakis from Netrics.

Quotes from the session:

"The information stored in databases is NEVER perfect, consistent and complete – and it never can be!"
"Gartner reports that 25% of critical data within large businesses is somehow inaccurate or incomplete"
"Gartner reports that 50% of implementations fail due to lack of attention to data quality issues"
"A powerful approach to data matching is the mathematical modeling of human decision making"
"The greatest advantage of mathematical modeling is that there are no data matching rules to build and maintain"

Defining a Balanced Scorecard for Data Management

Seminar by C. Lwanga Yonke, a founding member of the International Association for Information and Data Quality (IAIDQ).

Quotes from the seminar:

"Entering the same data multiple times is like paying the same invoice multiple times"
"Good metrics help start conversations and turn strategy into action"
Good metrics have the following characteristics:
- Business Relevance
- Clarity of Definition
- Trending Capability (i.e. metric can be tracked over time)
- Easy to aggregate and roll-up to a summary
- Easy to drill-down to the details that comprised the measurement

Closing Panel: Data Management’s Next Big Thing!

Quotes from Panelist Peter Aiken from Data Blueprint:

Capability Maturity Levels:
1. Initial
2. Repeatable
3. Defined
4. Managed
5. Optimized
"Most companies are at a capability maturity level of (1) Initial or (2) Repeatable"
"Data should be treated as a durable asset"

Quotes from Panelist Noreen Kendle from Burton Group:

"A new age for data and data management is on horizon – a perfect storm is coming"
"The perfect storm is being caused by massive data growth and software as a service (i.e. cloud computing)"
"Always remember that you can make lemonade from lemons – the bad in life can be turned into something good"

Quotes from Panelist Karen Lopez from InfoAdvisors:

"If you keep using the same recipe, then you keep getting the same results"
"Our biggest problem is not technical in nature - we simply need to share our knowledge"
"Don’t be a dinosaur! Adopt a ‘go with what is’ philosophy and embrace the future!"

Quotes from Panelist Eric Miller from Zepheira:

"Applications should not be ON The Web, but OF The Web"
"New Acronym: LED – Linked Enterprise Data"
"Semantic Web is the HTML of DATA"

Quotes from Panelist Daniel Moody from University of Twente:

"Unified Modeling Language (UML) was the last big thing in software engineering"
"The next big thing will be ArchiMate, which is a unified language for enterprise architecture modeling"

Mark Your Calendar

Enterprise Data World 2010 will take place in San Francisco, California at the Hilton San Francisco on March 14-18, 2010.

April 01, 2009

Data Quality Whitepapers are Worthless

April 01, 2009/ Jim Harris

During a 1609 interview, William Shakespeare was asked his opinion about an emerging genre of theatrical writing known as Data Quality Whitepapers. The "Bard of Avon" was clearly not a fan. His famously satirical response was:

Data quality's but a writing shadow, a poor paper

That struts and frets its words upon the page

And then is heard no more: it is a tale

Told by a vendor, full of sound and fury

Signifying nothing.

Four centuries later, I find myself in complete agreement with Shakespeare (and not just because Harold Bloom told me so).

Today is April Fool's Day, but I am not joking around - call Dennis Miller and Lewis Black - because I am ready to RANT.

I am sick and tired of reading whitepapers. Here is my "Bottom Ten List" explaining why:

Ones that make me fill out a "please mercilessly spam me later" contact information form before I am allowed to download them remind me of Mrs. Bun: "I DON'T LIKE SPAM!"
Ones that after I read their supposed pearls of wisdom, make me shake my laptop violently like an Etch-A-Sketch. I have lost count of how many laptops I have destroyed this way. I have starting buying them in bulk at Wal-Mart.
Ones comprised entirely of the exact same information found on the vendor's website make www = World Wide Worthless.
Ones that start out good, but just when they get to the really useful stuff, refer to content only available to paying customers. What a great way to guarantee that neither I nor anyone I know will ever become your paying customer!
Ones that have a "Shock and Awe" title followed by "Aw Shucks" content because apparently the entire marketing budget was spent on the title.
Ones that promise me the latest BUZZ but deliver only ZZZ are not worthless only when I have insomnia.
Ones that claim to be about data quality, but have nothing at all to do with data quality: "...don't make me angry. You wouldn't like me when I'm angry."
Ones that take the adage "a picture is worth a thousand words" too far by using a dizzying collage of logos, charts, graphs and other visual aids. This is one reason we're happy that Pablo Picasso was a painter. However, he did once write that "art is a lie that makes us realize the truth." Maybe he was defending whitepapers.
Ones that use acronyms without ever defining what they stand for remind me of that scene from Good Morning, Vietnam: "Excuse me, sir. Seeing as how the VP is such a VIP, shouldn't we keep the PC on the QT? Because if it leaks to the VC he could end up MIA, and then we'd all be put out in KP."
Ones that really know they're worthless but aren't honest about it. Don't promise me "The Top 10 Metrics for Data Quality Scorecards" and give me a list as pointless as this one.

I am officially calling out all writers of Data Quality Whitepapers.

Shakespeare and I both believe that you can't write anything about data quality that is worth reading.

Send your data quality whitepapers to Obsessive-Compulsive Data Quality and if it is not worthless, then I will let the world know that you proved Shakespeare and I wrong.

And while I am on a rant roll, I am officially calling out all Data Quality Bloggers.

The International Association for Information and Data Quality (IAIDQ) is celebrating its five year anniversary by hosting:

El Festival del IDQ Bloggers – A Blog Carnival for Information/Data Quality Bloggers

For more information about the blog carnival, please follow this link: IAIDQ Blog Carnival

March 26, 2009

Identifying Duplicate Customers

March 26, 2009/ Jim Harris

I just finished publishing a five part series of articles on data matching methodology for dealing with the common data quality problem of identifying duplicate customers.

The article series was published on Data Quality Pro, which is the leading data quality online magazine and free independent community resource dedicated to helping data quality professionals take their career or business to the next level.

Topics covered in the series:

Why a symbiosis of technology and methodology is necessary when approaching the common data quality problem of identifying duplicate customers
How performing a preliminary analysis on a representative sample of real project data prepares effective examples for discussion
Why using a detailed, interrogative analysis of those examples is imperative for defining your business rules
How both false negatives and false positives illustrate the highly subjective nature of this problem
How to document your business rules for identifying duplicate customers
How to set realistic expectations about application development
How to foster a collaboration of the business and technical teams throughout the entire project
How to consolidate identified duplicates by creating a “best of breed” representative record

To read the series, please follow these links:

March 13, 2009

Do you have obsessive-compulsive data quality (OCDQ)?

March 13, 2009/ Jim Harris

Obsessive-compulsive data quality (OCDQ) affects millions of people worldwide.

The most common symptoms of OCDQ are:

Obsessively verifying data used in critical business decisions
Compulsively seeking an understanding of data in business terms
Repeatedly checking that data is complete and accurate before sharing it
Habitually attempting to calculate the cost of poor data quality
Constantly muttering a mantra that data quality must be taken seriously

While the good folks at Prescott Pharmaceuticals are busy working on a treatment, I am dedicating this independent blog as group therapy to all those who (like me) have dealt with OCDQ their entire professional lives.

Over the years, the work of many individuals and organizations has been immensely helpful to those of us with OCDQ.

Some of these heroes deserve special recognition:

Data Quality Pro – Founded and maintained by Dylan Jones, Data Quality Pro is a free independent community resource dedicated to helping data quality professionals take their career or business to the next level. With the mission to create the most beneficial data quality resource that is freely available to members around the world, Data Quality Pro provides free software, job listings, advice, tutorials, news, views and forums. Their goal is "winning-by-sharing” and they believe that by contributing a small amount of their experience, skill or time to support other members then truly great things can be achieved. With the new Member Service Register, consultants, service providers and technology vendors can promote their services and include links to their websites and blogs.

International Association for Information and Data Quality (IAIDQ) – Chartered in January 2004, IAIDQ is a not-for-profit, vendor-neutral professional association whose purpose is to create a world-wide community of people who desire to reduce the high costs of low quality information and data by applying sound quality management principles to the processes that create, maintain and deliver data and information. IAIDQ was co-founded by Larry English and Tom Redman, who are two of the most respected and well-known thought and practice leaders in the field of information and data quality.IAIDQ also provides two excellent blogs: IQ Trainwrecks and Certified Information Quality Professional (CIQP).

Beth Breidenbach – her blog Confessions of a database geek is fantastic in and of itself, but she has also compiled an excellent list of data quality blogs and provides them via aggregated feeds in both Feedburner and Google Reader formats.

Vincent McBurney – his blog Tooling Around in the IBM InfoSphere is an entertaining and informative look at data integration in the IBM InfoSphere covering many IBM Information Server products such as DataStage, QualityStage and Information Analyzer.

Daragh O Brien – is a leading writer, presenter and researcher in the field of information quality management, with a particular interest in legal aspects of information quality. His blog The DOBlog is a popular and entertaining source of great material.

Steve Sarsfield – his blog Data Governance and Data Quality Insider covers the world of data integration, data governance, and data quality from the perspective of an industry insider. Also, check out his new book: The Data Governance Imperative.

OCDQ Blog

Applying Agile Software Engineering Principles to Data Governance

Monitor the Quality of your Master Data

Measuring the Business Impact of Data Governance

The Effect of the Financial Meltdown on Data Management

What the Business Wants

Social Karma

Live-Tweeting at Enterprise Data World 2010

Additional Resources

Related Posts

I am not a Social Media Expert—but I play one on the Internet

About This Series

Enterprise Data World 2010

Related Posts

IAIDQ

Data Quality Expert

The Times They Are a-Changin'

Share Your Perspectives

Related Posts

Additional IAIDQ Links

What is a Customer?

How Many Customers Do You Have?

Conclusion

Related Posts

Dylan Jones

Daragh O Brien

Steve Sarsfield

Daniel Gent

Henrik Liliendahl Sørensen

Stefanos Damianakis

Vish Agashe

Mark Goloboy

Additional Resources

I Fought The Two Headed Monster...

...And The Two Headed Monster Won

The Harsh Reality is that Monsters are Real

Are You Fighting The Two Headed Monster?

Related Articles

BI from Both Sides: Aligning Business and IT

TDWI Data Governance Summit

Data Quality Assessment - Practical Skills

Modern Data Quality Techniques in Action - A Demonstration Using Human Resources Data

Mark Your Calendar

Can You Say Anything Interesting About Data Quality?

Submit to Daragh

8 Ways Data is Changing Everything

Master Data Management: Proven Architectures, Products and Best Practices

Master Data Management: Ignore the Hype and Keep the Focus on Data

A Case of Usage: Working with Use Cases on Data-Centric Projects

Crowdsourcing: People are Smart, When Computers are Not

Improving Information Quality using Lean Six Sigma Methodology

Universal Data Quality: The Key to Deriving Business Value from Corporate Data

Defining a Balanced Scorecard for Data Management

Closing Panel: Data Management’s Next Big Thing!

Mark Your Calendar

OCDQ Blog