October 04, 2011

Information Quality Certified Professional

October 04, 2011/ Jim Harris

Information Quality Certified Professional (IQCP) is the new certification program from the IAIDQ. The application deadline for the next certification exam is October 25, 2011. For more information about IQCP certification, please refer to the following links:

About the IQCP Program: http://iaidq.org/iqcp/iqcp.shtml
Preparing for the Exam: http://iaidq.org/iqcp/exam-preparation.shtml
Exam Dates, Fees, and Locations: http://iaidq.org/iqcp/exam-date-location.shtml
IQCP Webinar by John Talburt: http://iaidq.org/webinars/2011-09-26.shtml
IQCP Webinar by Christian Walenta: http://iaidq.org/webinars/2011-03-15.shtml

Taking the first IQCP exam

A Guest Post written by Gordon Hamilton

I can still remember how galvanized I was by the first email mentions of the IQCP certification and its inaugural examination. I’d been a member of the IAIDQ for the past year and I saw the first mailings in early February 2011. It’s funny but my memory of the sequence of events was that I filled out the application for the examination that first night, but going back through my emails I see that I attended several IAIDQ Webinars and followed quite a few discussions on LinkedIn before I finally applied and paid for the exam in mid-March (I still got the early bird discount).

Looking back now, I am wondering why I was so excited about the chance to become certified in data quality. I know that I had been considering the CBIP and CBAP, from TDWI and IIBA respectively, for more than a year, going so far as to purchase study materials and take some sample exams. Both the CBIP and CBAP designations fit where my career had been for 20+ years, but the subject areas were now tangential to my focus on information and data quality.

The IQCP certification fit exactly where I hoped my career trajectory was now taking me, so it really did galvanize me to action.

I had been a software and database developer for 20+ years when I caught a bad case of Deming-god worship while contracting at Microsoft in the early 2000s, and it only got worse as I started reading books by Olson, Redman, English, Loshin, John Morris, and Maydanchik on how data quality dovetailed with development methodologies of folks like Kimball and Inmon, which in turn dovetailed with the Lean Six Sigma methods. I was on the slippery slope to choosing data quality as a career because those gurus of Data Quality, and Quality in general, were explaining, and I was finally starting to understand, why data warehouse projects failed so often, and why the business was often underwhelmed by the information product.

I had 3+ months to study and the resource center on the IAIDQ website had a list of recommended books and articles. I finally had to live up to my moniker on Twitter of DQStudent. I already had many of the books recommended by IAIDQ at home but hadn’t read them all yet, so while I waited for Amazon and AbeBooks to send me the books I thought were crucial, I began reading Deming, English, and Loshin.

Of all the books that began arriving on my doorstep, the most memorable was Journey to Data Quality by Richard Wang et al.

That book created a powerful image in my head of the information product “manufactured” by every organization. That image of the “information product” made the suggestions by the data quality gurus much clearer. They were showing how to apply quality techniques to the manufacture of Business Intelligence. The image gave me a framework upon which to hang the other knowledge I was gathering about data quality, so it was easier to keep pushing through the books and articles because each new piece could fit somewhere in that manufacturing process.

I slept well the night before the exam, and gave myself plenty of time to make it to the Castle exam site that afternoon. I took along several books on data quality, but hardly glanced at them. Instead I grabbed a quick lunch and then a strong coffee to carry me through the 3 hour exam. At 50 questions per hour I was very conscious of how long each question was taking me and every 10 questions or so I would check to see if was going to run into time trouble. It was obvious after 20 questions that I had plenty of time so I began to get into a groove, finishing the exam 30 minutes early, leaving plenty of time to review any questionable answers.

I found the exam eminently fair with no tricky question constructions at all, so I didn’t seem to fall into the over-thinking trap that I sometimes do. Even better, the exam wasn’t the type that drilled deeper and deeper into my knowledge gaps when I missed a question. Even though I felt confident that I had passed, I’ve got to tell you that the 6 weeks that the IAIDQ took to determine the passing threshold on this inaugural exam and send out passing notifications were the longest 6 weeks I have spent for a long time. Now that the passing mark is established, they swear that the notifications will be sent out much faster.

I still feel a warm glow as I think back on achieving IQCP certification. I am proud to say that I am a data quality consultant and I have the certificate proving the depth and breadth of my knowledge.

Gordon Hamilton is a Data Quality, Data Warehouse, and IQCP certified professional, whose 30 years’ experience in the information business encompasses many industries, including government, legal, healthcare, insurance and financial.

Studying Data Quality

The Blue Box of Information Quality

Data, Information, and Knowledge Management

Are you turning Ugly Data into Cute Information?

The Dichotomy Paradox, Data Quality and Zero Defects

The Data Quality Wager

September 15, 2011

The Blue Box of Information Quality

September 15, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

On this episode, Daragh O Brien and I discuss the Blue Box of Information Quality, which is much bigger on the inside, as well as using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Daragh O Brien is one of Ireland’s leading Information Quality and Governance practitioners. After being born at a young age, Daragh has amassed a wealth of experience in quality information driven business change, from CRM Single View of Customer to Regulatory Compliance, to Governance and the taming of information assets to benefit the bottom line, manage risk, and ensure customer satisfaction. Daragh O Brien is the Managing Director of Castlebridge Associates, one of Ireland’s leading consulting and training companies in the information quality and information governance space.

Daragh O Brien is a founding member and former Director of Publicity for the IAIDQ, which he is still actively involved with. He was a member of the team that helped develop the Information Quality Certified Professional (IQCP) certification and he recently became the first person in Ireland to achieve this prestigious certification.

In 2008, Daragh O Brien was awarded a Fellowship of the Irish Computer Society for his work in developing and promoting standards of professionalism in Information Management and Governance.

Daragh O Brien is a regular conference presenter, trainer, blogger, and author with two industry reports published by Ark Group, the most recent of which is The Data Strategy and Governance Toolkit.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

September 01, 2011

Studying Data Quality

September 01, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

On this episode, Gordon Hamilton and I discuss data quality key concepts, including those which we have studied in some of our favorite data quality books, and more important, those which we have implemented in our careers as data quality practitioners.

Gordon Hamilton is a Data Quality and Data Warehouse professional, whose 30 years’ experience in the information business encompasses many industries, including government, legal, healthcare, insurance and financial. Gordon was most recently engaged in the healthcare industry in British Columbia, Canada, where he continues to advise several health care authorities on data quality and business intelligence platform issues.

Gordon Hamilton’s passion is to bring together:

Exposure of business rules through data profiling as recommended by Ralph Kimball.
Monitoring business rules in the EQTL (Extract-Quality-Transform-Load) pipeline leading into the data warehouse.
Managing the business rule violations through systemic and specific solutions within the statistical process control framework of Shewhart/Deming.
Researching how to sustain data quality metrics as the “fit for purpose” definitions change faster than the information product process can easily adapt.

Gordon Hamilton’s moniker of DQStudent on Twitter hints at his plan to dovetail his Lean Six Sigma skills and experience with the data quality foundations to improve the manufacture of the “information product” in today’s organizations. Gordon is a member of IAIDQ, TDWI, and ASQ, as well as an enthusiastic reader of anything pertaining to data.

Gordon Hamilton recently became an Information Quality Certified Professional (IQCP), via the IAIDQ certification program.

Recommended Data Quality Books

By no means a comprehensive list, and listed in no particular order whatsoever, the following books were either discussed during this OCDQ Radio episode, or are otherwise recommended for anyone looking to study data quality and its related disciplines:

Data Driven: Profiting from Your Most Important Business Asset by Thomas Redman

The Practitioner’s Guide to Data Quality Improvement by David Loshin

Information Quality Applied: Best Practices for Improving Business Information, Processes and Systems by Larry English

Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information by Danette McGilvray

The Data Asset: How Smart Companies Govern Their Data for Business Success by Tony Fisher

The Data Governance Imperative by Steve Sarsfield

Data Quality Assessment by Arkady Maydanchik

Data Quality: The Accuracy Dimension by Jack Olson

Entity Resolution and Information Quality by John Talburt

Practical Data Migration by John Morris

Customer Data Integration: Reaching a Single Version of the Truth by Jill Dyché and Evan Levy

Master Data Management in Practice: Achieving True Customer MDM by Dalton Cervo and Mark Allen

Journey to Data Quality by Yang Lee, Leo Pipino, Richard Wang, James Funk

Quality Without Tears: The Art of Hassle-Free Management by Philip Crosby

Out of the Crisis by W. Edwards Deming

Juran’s Quality Handbook: The Complete Guide to Performance Excellence (Sixth Edition) by Joseph Juran

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

April 18, 2011

How active is your data quality practice?

April 18, 2011/ Jim Harris

My recent blog post The Data Quality Wager received a provocative comment from Richard Ordowich that sparked another round of discussion and debate about proactive data quality versus reactive data quality in the LinkedIn Group for the IAIDQ.

“Data quality is a reactive practice,” explained Ordowich. “Perhaps that is not what is professed in the musings of others or the desired outcome, but it is nevertheless the current state of the best practices. Data profiling and data cleansing are after the fact data quality practices. The data is already defective. Proactive defect prevention requires a greater discipline and changes to organizational behavior that is not part of the current best practices. This I suggest is wishful thinking at this point in time.”

“How can data quality practices,” C. Lwanga Yonke responded, “that do not include proactive defect prevention (with the required discipline and changes to organizational behavior) be considered best practices? Seems to me a data quality program must include these proactive activities to be considered a best practice. And from what I see, there are many such programs out there. True, they are not the majority—but they do exist.”

After Ordowich requested real examples of proactive data quality practices, Jayson Alayay commented “I have implemented data quality using statistical process control techniques where expected volumes and ratios are predicted using forecasting models that self-adjust using historical trends. We receive an alert when significant deviations from forecast are detected. One of our overarching data quality goals is to detect a significant data issue as soon as it becomes detectable in the system.”

“It is possible,” replied Ordowich, “to estimate the probability of data errors in data sets based on the currency (freshness) and usage of the data. The problem is this process does not identify the specific instances of errors just the probability that an error may exist in the data set. These techniques only identify trends not specific instances of errors. These techniques do not predict the probability of a single instance data error that can wreak havoc. For example, the ratings of mortgages was a systemic problem, which data quality did not address. Yet the consequences were far and wide. Also these techniques do not predict systemic quality problems related to business policies and processes. As a result, their direct impact on the business is limited.”

“For as long as human hands key in data,” responded Alayay, “a data quality implementation to a great extent will be reactive. Improving data quality not only pertains to detection of defects, but also enhancement of content, e.g., address standardization, geocoding, application of rules and assumptions to replace missing values, etc. With so many factors in play, a real life example of a proactive data quality implementation that suits what you’re asking for may be hard to pinpoint. My opinion is that the implementation of ‘comprehensive’ data quality programs can have big rewards and big risks. One big risk is that it can slow time-to-market and kill innovation because otherwise talented people would be spending a significant amount of their time complying with rules and standards in the name of improving data quality.”

“When an organization embarks on a new project,” replied Ordowich, “at what point in the conversation is data quality discussed? How many marketing plans, new product development plans, or even software development plans have you seen include data quality? Data quality is not even an afterthought in most organizations, it is ignored. Data quality is not even in the vocabulary until a problem occurs. Data quality is not part of the culture or behaviors within most organizations.”

Please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

A Tale of Two Q’s

What going to the dentist taught me about data quality

Groundhog Data Quality Day

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

What Data Quality Technology Wants

MacGyver: Data Governance and Duct Tape

To Our Data Perfectionists

Finding Data Quality

Retroactive Data Quality

August 01, 2010

El Festival del IDQ Bloggers (June and July 2010)

August 01, 2010/ Jim Harris

Welcome to the June and July 2010 issue of El Festival del IDQ Bloggers, which is a blog carnival by the IAIDQ that offers a great opportunity for both information quality and data quality bloggers to get their writing noticed and to connect with other bloggers around the world.

Definition Drift

Graham Rhind submitted his July blog post Definition drift, which examines the persistent problems facing attempts to define a consistent terminology within the data quality industry.

It is essential to the success of a data quality initiative that its key concepts are clearly defined and in a language that everyone can understand. Therefore, I also recommend that you check out the free online data quality glossary built and maintained by Graham Rhind by following this link: Data Quality Glossary.

Lemonade Stand Data Quality

Steve Sarsfield submitted his July blog post Lemonade Stand Data Quality, which explains that data quality projects are a form of capitalism, meaning that you need to sell your customers a refreshing glass and keep them coming back for more.

What’s In a Given Name?

Henrik Liliendahl Sørensen submitted his June blog post What’s In a Given Name?, which examines a common challenge facing data quality, master data management, and data matching—namely (pun intended), how to automate the interpretation of the “given name” (aka “first name”) component of a person’s name separately from their “family name” (aka “last name”).

Solvency II Standards for Data Quality

Ken O’Connor submitted his July blog post Solvency II Standards for Data Quality, which explains the Solvency II standards are common sense data quality standards, which can enable all organizations, regardless of their industry or region, to achieve complete, appropriate, and accurate data.

How Accuracy Has Changed

Scott Schumacher submitted his July blog post How Accuracy Has Changed, which explains that accuracy means being able to make the best use of all the information you have, putting data together where necessary, and keeping it apart where necessary.

Uniqueness is in the Eye of the Beholder

Marty Moseley submitted his June blog post Uniqueness is in the Eye of the Beholder, which beholds the challenge of uniqueness and identity matching, where determining if data records should be matched is often a matter of differing perspectives among groups within an organization, where what one group considers unique, another group considers non-unique or a duplicate.

Uniqueness in the Eye of the NSTIC

Jeffrey Huth submitted his July blog post Uniqueness in the Eye of the NSTIC, which examines a recently drafted document in the United States regarding a National Strategy for Trusted Identities in Cyberspace (NSTIC).

Profound Profiling

Daragh O Brien submitted his July blog post Profound Profiling, which recounts how he has found data profiling cropping up in conversations and presentations he’s been making recently, even where the topic of the day wasn’t “Information Quality” and shares his thoughts on the profound benefits of data profiling for organizations seeking to manage risk and ensure compliance.

Wanted: a Data Quality Standard for Open Government Data

Sarah Burnett submitted her July blog post Wanted: a Data Quality Standard for Open Government Data, which calls for the establishment of data quality standards for open government data (i.e., public data sets) since more of it is becoming available.

Data Quality Disasters in the Social Media Age

Dylan Jones submitted his July blog post The reality of data quality disasters in a social media age, which examines how bad news sparked by poor data quality travels faster and further than ever before, by using the recent story about the Enbridge Gas billing blunders as a practical lesson for all companies sitting on the data quality fence.

Finding Data Quality

Jim Harris (that’s me referring to myself in the third person) submitted my July blog post Finding Data Quality, which explains (with the help of the movie Finding Nemo) that although data quality is often discussed only in its relation to initiatives such as master data management, business intelligence, and data governance, eventually you’ll be finding data quality everywhere.

Editor’s Selections

In addition to the official submissions above, I selected the following great data quality blog posts published in June or July 2010:

Check out the past issues of El Festival del IDQ Bloggers

El Festival del IDQ Bloggers (May 2010) – edited by Castlebridge Associates

El Festival del IDQ Bloggers (April 2010) – edited by Graham Rhind

El Festival del IDQ Bloggers (March 2010) – edited by Phil Wright

El Festival del IDQ Bloggers (February 2010) – edited by William Sharp

El Festival del IDQ Bloggers (January 2010) – edited by Henrik Liliendahl Sørensen

El Festival del IDQ Bloggers (November 2009) – edited by Daragh O Brien

El Festival del IDQ Bloggers (October 2009) – edited by Vincent McBurney

El Festival del IDQ Bloggers (September 2009) – edited by Daniel Gent

El Festival del IDQ Bloggers (August 2009) – edited by William Sharp

El Festival del IDQ Bloggers (July 2009) – edited by Andrew Brooks

El Festival del IDQ Bloggers (June 2009) – edited by Steve Sarsfield

El Festival del IDQ Bloggers (May 2009) – edited by Daragh O Brien

El Festival del IDQ Bloggers (April 2009) – edited by Jim Harris

July 13, 2010

The 2010 Data Quality Blogging All-Stars

July 13, 2010/ Jim Harris

The 2010 Major League Baseball (MLB) All-Star Game is being held tonight (July 13) at Angel Stadium in Anaheim, California.

For those readers who are not baseball fans, the All-Star Game is an annual exhibition held in mid-July that showcases the players with (for the most part) the best statistical performances during the first half of the MLB season.

Last summer, I began my own annual exhibition of showcasing the bloggers whose posts I have personally most enjoyed reading during the first half of the data quality blogging season.

Therefore, this post provides links to stellar data quality blog posts that were published between January 1 and June 30 of 2010. My definition of a “data quality blog post” also includes Data Governance, Master Data Management, and Business Intelligence.

Please Note: There is no implied ranking in the order that bloggers or blogs are listed, other than that Individual Blog All-Stars are listed first, followed by Vendor Blog All-Stars, and the blog posts are listed in reverse chronological order by publication date.

Henrik Liliendahl Sørensen

From Liliendahl on Data Quality:

Dylan Jones

From Data Quality Pro:

Julian Schwarzenbach

From Data and Process Advantage Blog:

Radioactive gold data
Does data make you lonely ?!?
The Data Accident Investigation Board
The Data Zoo (Five Part Series and White Paper): Part 1, Part 2, Part 3, Part 4, Part 5
How tasty is your data quality cheese?

Rich Murnane

From Rich Murnane's Blog:

Phil Wright

From Data Factotum:

How are you Executing your Data Quality Strategy?
How do you identify your strategic data?
The First Step on your Data Quality Roadmap
Can motivations impact the state of data quality?
A balanced approach to scoring data quality (Six Part Series): Part 1, Part 2, Part 3, Part 4, Part 5, Part 6
The Great Expectations of BI
Questions to measure BI DQ/DM success

Initiate – an IBM Company

From Mastering Data Management:

Baseline Consulting

From their three blogs: Inside the Biz with Jill Dyché, Inside IT with Evan Levy, and In the Field with our Experts:

DataFlux – a SAS Company

From Community of Experts:

Recently Read: May 15, 2010

Recently Read: March 22, 2010

Recently Read: March 6, 2010

Recently Read: January 23, 2010

The 2009 Data Quality Blogging All-Stars

Additional Resources

From the IAIDQ, read the 2010 issues of the Blog Carnival for Information/Data Quality:

November 07, 2009

The Once and Future Data Quality Expert

November 07, 2009/ Jim Harris

Wednesday, November 11 is World Quality Day 2009.

World Quality Day was established by the United Nations in 1990 as a focal point for the quality management profession and as a celebration of the contribution that quality makes to the growth and prosperity of nations and organizations. The goal of World Quality Day is to raise awareness of how quality approaches (including data quality best practices) can have a tangible effect on business success, as well as contribute towards world-wide economic prosperity.

IAIDQ

The International Association for Information and Data Quality (IAIDQ) was chartered in January 2004 and is a not-for-profit, vendor-neutral professional association whose purpose is to create a world-wide community of people who desire to reduce the high costs of low quality information and data by applying sound quality management principles to the processes that create, maintain and deliver data and information.

Since 2007 the IAIDQ has celebrated World Quality Day as a springboard for improvement and a celebration of successes. Please join us to celebrate World Quality Day by participating in our interactive webinar in which the Board of Directors of the IAIDQ will share with you stories and experiences to promote data quality improvements within your organization.

In my recent Data Quality Pro article The Future of Information and Data Quality, I reported on the IAIDQ Ask The Expert Webinar with co-founders Larry English and Tom Redman, two of the industry pioneers for data quality and two of the most well-known data quality experts.

Data Quality Expert

As World Quality Day 2009 approaches, my personal reflections are focused on what the title data quality expert has meant in the past, what it means today, and most important, what it will mean in the future.

With over 15 years of professional services and application development experience, I consider myself to be a data quality expert. However, my experience is paltry by comparison to English, Redman, and other industry luminaries such as David Loshin, to use one additional example from many.

Experience is popularly believed to be the path that separates knowledge from wisdom, which is usually accepted as another way of defining expertise.

Oscar Wilde once wrote that “experience is simply the name we give our mistakes.” I agree. I have found that the sooner I can recognize my mistakes, the sooner I can learn from the lessons they provide, and hopefully prevent myself from making the same mistakes again.

The key is early detection. As I gain experience, I gain an improved ability to more quickly recognize my mistakes and thereby expedite the learning process.

James Joyce wrote that “mistakes are the portals of discovery” and T.S. Eliot wrote that “we must not cease from exploration and the end of all our exploring will be to arrive where we began and to know the place for the first time.”

What I find in the wisdom of these sages is the need to acknowledge the favor our faults do for us. Therefore, although experience is the path that separates knowledge from wisdom, the true wisdom of experience is the wisdom of failure.

As Jonah Lehrer explained: “Becoming an expert just takes time and practice. Once you have developed expertise in a particular area, you have made the requisite mistakes.”

But expertise in any discipline is more than simply an accumulation of mistakes and birthdays. And expertise is not a static state that once achieved, allows you to simply rest on your laurels.

In addition to my real-world experience working on data quality initiatives for my clients, I also read all of the latest books, articles, whitepapers, and blogs, as well as attend as many conferences as possible.

The Times They Are a-Changin'

Much of the discussion that I have heard regarding the future of the data quality profession has been focused on the need for the increased maturity of both practitioners and organizations. Although I do not dispute this need, I am concerned about the apparent lack of attention being paid to how fast the world around us is changing.

Rapid advancements in technology, coupled with the meteoric rise of the Internet and social media (blogs, wikis, Twitter, Facebook, LinkedIn, etc.) has created an amazing medium that is enabling people separated by vast distances and disparate cultures to come together, communicate, and collaborate in ways few would have thought possible just a few decades ago.

I don't believe that it is an exaggeration to state that we are now living in an age where the contrast between the recent past and the near future is greater than perhaps it has ever been in human history. This brave new world has such people and technology in it, that practically every new day brings the possibility of another quantum leap forward.

Although it has been argued by some that the core principles of data quality management are timeless, I must express my doubt. The daunting challenges of dramatically increasing data volumes and the unrelenting progress of cloud computing, software as a service (SaaS), and mobile computing architectures, would appear to be racing toward a high-speed collision with our time-tested (but time-consuming to implement properly) data quality management principles.

The times they are indeed changing and I believe we must stop using terms like Six Sigma and Kaizen as if they were a shibboleth. If these or any other disciplines are to remain relevant, then we must honestly assess them in the harsh and unforgiving light of our brave new world that is seemingly changing faster than the speed of light.

Expertise is not static. Wisdom is not timeless. The only constant is change. For the data quality profession to truly mature, our guiding principles must change with the times, or be relegated to a past that is all too quickly becoming distant.

Share Your Perspectives

In celebration of World Quality Day, please share your perspectives regarding the past, present, and most important, the future of the data quality profession. With apologies to T. H. White, I declare this debate to be about the difference between:

The Once and Future Data Quality Expert

Mistake Driven Learning

The Fragility of Knowledge

The Wisdom of Failure

A Portrait of the Data Quality Expert as a Young Idiot

The Nine Circles of Data Quality Hell

Additional IAIDQ Links

IAIDQ Ask The Expert Webinar: World Quality Day 2009

IAIDQ Ask The Expert Webinar with Larry English and Tom Redman

INTERVIEW: Larry English - IAIDQ Co-Founder

INTERVIEW: Tom Redman - IAIDQ Co-Founder

IAIDQ Publications Portal

July 14, 2009

Data Quality Blogging All-Stars

July 14, 2009/ Jim Harris

The 2009 Major League Baseball (MLB) All-Star Game is being held tonight at Busch Stadium in St. Louis, Missouri.

For those readers who are not baseball fans, the All-Star Game is an annual exhibition held in mid-July that showcases the players with the best statistical performances from the first half of the MLB season.

As I watch the 80th Midsummer Classic, I offer this exhibition that showcases the bloggers with the posts I have most enjoyed reading from the first half of the 2009 data quality blogging season.

Dylan Jones

From Data Quality Pro:

How to transform your ETL tool into a data quality toolkit
DEBATE: How should data governance and data quality work together?
Selecting Data Quality Software (Two Part Series): Part 1, Part 2
Creating An Internal Data Quality Community (Four Part Series): Part 1, Part 2, Part 3, Part 4
15 Tips for transforming knowledge-workers into a data quality task force
10 Tips to help data quality professionals boost their career prospects in the downturn

Daragh O Brien

From The DOBlog:

Steve Sarsfield

From Data Governance and Data Quality Insider:

Daniel Gent

From Data Quality Edge:

Sun Tzu and the Art of Data Quality (Two Part Series): Part 1, Part 2
DQ is 1/3 Process Knowledge + 1/3 Business Knowledge + 1/3 Intuition
When Bad Data Becomes Acceptable Data
DQ Problems? Start a Data Quality Recognition Program!
Five Attributes for the Data Quality Analyst

Henrik Liliendahl Sørensen

From Liliendahl on Data Quality:

Stefanos Damianakis

From Netrics HD:

TSA False Negatives and the URoSD
TSA “Secure Flight” will require more demographic information
Narrative Fallacy and Data Matching
What’s in a Name? (Three Part Series): Part 1, Part 2, Part 3

Vish Agashe

From Business Intelligence: Process, People and Products:

Mark Goloboy

From Boston Data, Technology & Analytics:

Additional Resources

Over on Data Quality Pro, read the data quality blog roundups from the first half of 2009:

From the IAIDQ, read the 2009 issues of the IAIDQ Blog Carnival:

May 09, 2009

TDWI World Conference Chicago 2009

May 09, 2009/ Jim Harris

Founded in 1995, TDWI (The Data Warehousing Institute™) is the premier educational institute for business intelligence and data warehousing that provides education, training, certification, news, and research for executives and information technology professionals worldwide. TDWI conferences always offer a variety of full-day and half-day courses taught in an objective, vendor-neutral manner. The courses taught are designed for professionals and taught by in-the-trenches practitioners who are well known in the industry.

TDWI World Conference Chicago 2009 was held May 3-8 in Chicago, Illinois at the Hyatt Regency Hotel and was a tremendous success. I attended as a Data Quality Journalist for the International Association for Information and Data Quality (IAIDQ).

I used Twitter to provide live reporting from the conference. Here are my notes from the courses I attended:

BI from Both Sides: Aligning Business and IT

Jill Dyché, CBIP, is a partner and co-founder of Baseline Consulting, a management and technology consulting firm that provides data integration and business analytics services. Jill is responsible for delivering industry and client advisory services, is a frequent lecturer and writer on the business value of IT, and writes the excellent Inside the Biz blog. She is the author of acclaimed books on the business value of information: e-Data: Turning Data Into Information With Data Warehousing and The CRM Handbook: A Business Guide to Customer Relationship Management. Her latest book, written with Evan Levy, is Customer Data Integration: Reaching a Single Version of the Truth.

Course Quotes from Jill Dyché:

Five Critical Success Factors for Business Intelligence (BI):
1. Organization - Build organizational structures and skills to foster a sustainable program
2. Processes - Align both business and IT development processes that facilitate delivery of ongoing business value
3. Technology - Select and build technologies that deploy information cost-effectively
4. Strategy - Align information solutions to the company's strategic goals and objectives
5. Information - Treat data as an asset by separating data management from technology implementation
Three Different Requirement Categories:
1. What is the business need, pain, or problem? What business questions do we need to answer?
2. What data is necessary to answer those business questions?
3. How do we need to use the resulting information to answer those business questions?
“Data warehouses are used to make business decisions based on data – so data quality is critical”
“Even companies with mature enterprise data warehouses still have data silos - each business area has its own data mart”
“Instead of pushing a business intelligence tool, just try to get people to start using data”
“Deliver a usable system that is valuable to the business and not just a big box full of data”

TDWI Data Governance Summit

Philip Russom is the Senior Manager of Research and Services at TDWI, where he oversees many of TDWI’s research-oriented publications, services, and events. Prior to joining TDWI in 2005, he was an industry analyst covering BI at Forrester Research, as well as a contributing editor with Intelligent Enterprise and Information Management (formerly DM Review) magazines.

Summit Quotes from Philip Russom:

“Data Governance usually boils down to some form of control for data and its usage”
“Four Ps of Data Governance: People, Policies, Procedures, Process”
“Three Pillars of Data Governance: Compliance, Business Transformation, Business Integration”
“Two Foundations of Data Governance: Business Initiatives and Data Management Practices”
“Cross-functional collaboration is a requirement for successful Data Governance”

Becky Briggs, CBIP, CMQ/OE, is a Senior Manager and Data Steward for Airlines Reporting Corporation (ARC) and has 25 years of experience in data processing and IT - the last 9 in data warehousing and BI. She leads the program team responsible for product, project, and quality management, business line performance management, and data governance/stewardship.

Summit Quotes from Becky Briggs:

“Data Governance is the act of managing the organization's data assets in a way that promotes business value, integrity, usability, security and consistency across the company”
Five Steps of Data Governance:
1. Determine what data is required
2. Evaluate potential data sources (internal and external)
3. Perform data profiling and analysis on data sources
4. Data Services - Definition, modeling, mapping, quality, integration, monitoring
5. Data Stewardship - Classification, access requirements, archiving guidelines
“You must realize and accept that Data Governance is a program and not just a project”

Barbara Shelby is a Senior Software Engineer for IBM with over 25 years of experience holding positions of technical specialist, consultant, and line management. Her global management and leadership positions encompassed network authentication, authorization application development, corporate business systems data architecture, and database development.

Summit Quotes from Barbara Shelby:

Four Common Barriers to Data Governance:
1. Information - Existence of information silos and inconsistent data meanings
2. Organization - Lack of end-to-end data ownership and organization cultural challenges
3. Skill - Difficulty shifting resources from operational to transformational initiatives
4. Technology - Business data locked in large applications and slow deployment of new technology
Four Key Decision Making Bodies for Data Governance:
1. Enterprise Integration Team - Oversees the execution of CIO funded cross enterprise initiatives
2. Integrated Enterprise Assessment - Responsible for the success of transformational initiatives
3. Integrated Portfolio Management Team - Responsible for making ongoing business investment decisions
4. Unit Architecture Review - Responsible for the IT architecture compliance of business unit solutions

Lee Doss is a Senior IT Architect for IBM with over 25 years of information technology experience. He has a patent for process of aligning strategic capability for business transformation and he has held various positions including strategy, design, development, and customer support for IBM networking software products.

Summit Quotes from Lee Doss:

Five Data Governance Best Practices:
1. Create a sense of urgency that the organization can rally around
2. Start small, grow fast...pick a few visible areas to set an example
3. Sunset legacy systems (application, data, tools) as new ones are deployed
4. Recognize the importance of organization culture…this will make or break you
5. Always, always, always – Listen to your customers

Kevin Kramer is a Senior Vice President and Director of Enterprise Sales for UMB Bank and is responsible for development of sales strategy, sales tool development, and implementation of enterprise-wide sales initiatives.

Summit Quotes from Kevin Kramer:

“Without Data Governance, multiple sources of customer information can produce multiple versions of the truth”
“Data Governance helps break down organizational silos and shares customer data as an enterprise asset”
“Data Governance provides a roadmap that translates into best practices throughout the entire enterprise”

Kanon Cozad is a Senior Vice President and Director of Application Development for UMB Bank and is responsible for overall technical architecture strategy and oversees information integration activities.

Summit Quotes from Kanon Cozad:

“Data Governance identifies business process priorities and then translates them into enabling technology”
“Data Governance provides direction and Data Stewardship puts direction into action”
“Data Stewardship identifies and prioritizes applications and data for consolidation and improvement”

Summit Quotes from Jill Dyché:

“The hard part of Data Governance is the data”
“No data will be formally sanctioned unless it meets a business need”
“Data Governance focuses on policies and strategic alignment”
“Data Management focuses on translating defined polices into executable actions”
“Entrench Data Governance in the development environment”
“Everything is customer data – even product and financial data”

Data Quality Assessment - Practical Skills

Arkady Maydanchik is a co-founder of Data Quality Group, a recognized practitioner, author, and educator in the field of data quality and information integration. Arkady's data quality methodology and breakthrough ARKISTRA technology were used to provide services to numerous organizations. Arkady is the author of the excellent book Data Quality Assessment, a frequent speaker at various conferences and seminars, and a contributor to many journals and online publications. Data quality curriculum by Arkady Maydanchik can be found at eLearningCurve.

Course Quotes from Arkady Maydanchik:

“Nothing is worse for data quality than desperately trying to fix it during the last few weeks of an ETL project”
“Quality of data after conversion is in direct correlation with the amount of knowledge about actual data”
“Data profiling tools do not do data profiling - it is done by data analysts using data profiling tools”
“Data Profiling does not answer any questions - it helps us ask meaningful questions”
“Data quality is measured by its fitness to the purpose of use – it's essential to understand how data is used”
“When data has multiple uses, there must be data quality rules for each specific use”
“Effective root cause analysis requires not stopping after the answer to your first question - Keep asking: Why?”
“The central product of a Data Quality Assessment is the Data Quality Scorecard”
“Data quality scores must be both meaningful to a specific data use and be actionable”
“Data quality scores must estimate both the cost of bad data and the ROI of data quality initiatives”

Modern Data Quality Techniques in Action - A Demonstration Using Human Resources Data

Gian Di Loreto formed Loreto Services and Technologies in 2004 from the client services division of Arkidata Corporation. Loreto Services provides data cleansing and integration consulting services to Fortune 500 companies. Gian is a classically trained scientist - he received his PhD in elementary particle physics from Michigan State University.

Course Quotes from Gian Di Loreto:

“Data Quality is rich with theory and concepts – however it is not an academic exercise, it has real business impact”
“To do data quality well, you must walk away from the computer and go talk with the people using the data”
“Undertaking a data quality initiative demands developing a deeper knowledge of the data and the business”
“Some essential data quality rules are ‘hidden’ and can only be discovered by ‘clicking around’ in the data”
“Data quality projects are not about systems working together - they are about people working together”
“Sometimes, data quality can be ‘good enough’ for source systems but not when integrated with other systems”
“Unfortunately, no one seems to care about bad data until they have it”
“Data quality projects are only successful when you understand the problem before trying to solve it”

Mark Your Calendar

TDWI World Conference San Diego 2009 - August 2-7, 2009.

TDWI World Conference Orlando 2009 - November 1-6, 2009.

TDWI World Conference Las Vegas 2010 - February 21-26, 2010.

April 30, 2009

El Festival del IDQ Bloggers (April 2009)

April 30, 2009/ Jim Harris

Welcome to the April 2009 issue of El Festival del IDQ Bloggers, which is a blog carnival for information/data quality bloggers being run as part of the celebration of the five year anniversary of the International Association for Information and Data Quality (IAIDQ).

A blog carnival is a collection of posts from different blogs on a specific theme that are published across a series of issues. Anyone can submit a data quality blog post and experience the benefits of extra traffic, networking with other bloggers and discovering interesting posts. It doesn't matter what type of blog you have as long as the submitted post has a data quality theme.

El Festival del IDQ Bloggers will run monthly issues April through November 2009.

Can You Say Anything Interesting About Data Quality?

This simple question launched the first blog carnival of data quality that ran four issues from late 2007 through early 2008:

Blog Carnival of Data Quality (November 2007)

Blog Carnival of Data Quality (December 2007)

Blog Carnival of Data Quality (January 2008)

Blog Carnival of Data Quality (February 2008)

How to give your Data Warehouse a Data Quality Immunity System

Vincent McBurney is a manager for Deloitte consulting in Perth, Australia. His excellent blog Tooling Around in the IBM InfoSphere looks at the world of data integration software and occasionally wonders what IBM is up to. His data quality motto: "If it ain’t broke, don't fix it."

Vincent submitted How to give your Data Warehouse a Data Quality Immunity System that discusses how people who obsessively keep bad quality data out of a data warehouse may be making it unhealthy in the long run.

Stuck in First Gear

Michele Goetz is a free-lance consultant helping companies make sense of their business through better analysis, marketing best practices, and marketing solutions. Her excellent blog Intelligent Metrix guides you on your journey from data to metrics to insight to intelligent decisions. Her blog de-mystifies business intelligence and data management for the business, and helps you bridge the Business-IT gap for better processes and solutions that drive business success.

Michele submitted Stuck in First Gear that discusses the common problem when companies make big investments in enterprise class solutions but only use a portion of the capabilities, which is like driving a Porcshe in first gear.

When Bad Data Becomes Acceptable Data

Daniel Gent is a bilingual business analyst experienced with the System Development Life Cycle (SDLC), decision making, change management, database design, data modeling, data quality management, project coordination, and problem resolution. His excellent blog Data Quality Edge is a grassroots look at data quality for the data quality analyst in the trenches.

Daniel submitted When Bad Data Becomes Acceptable Data that discusses how you need to prioritize bad data and determine when it is acceptable to keep it for now.

Customer Value and Sustainable Quality

Daniel Bahula is a strategy and operations improvement professional with an extensive project experience from multinational telco, software development and professional services companies. His excellent blog DanBahula.net defies a simple definition and is a great example of how it doesn't matter what type of blog you have as long as the submitted post has a data quality theme.

Daniel submitted Customer Value and Sustainable Quality that discusses Six Sigma and its relevance to addressing data quality issues.

Data Quality, Entity Resolution, and OFAC Compliance

Bob Barker is the editor of Identity Resolution Daily, which is a corporate blog of Austin, TX-based Infoglide Software strongly dedicated to citizenship, integrity and communication. The blog has recently been gaining guest bloggers with varying points of view, helping it to become an excellent site for information, dialogue and community.

Bob submitted Data Quality, Entity Resolution, and OFAC Compliance that discusses how entity resolution is different from name matching and traditional data quality.

Selecting Data Quality Software

Dylan Jones is the editor of Data Quality Pro, which is the leading data quality online magazine and free independent community resource dedicated to helping data quality professionals take their career or business to the next level.

Dylan submitted Selecting Data Quality Software that discusses how to find the right data quality technology for your needs and your budget.

AmazonFail - A Classic Information Quality Impact

Since 2006, IQTrainwrecks.com, which is a community blog provided and administered by the International Association for Information and Data Quality (IAIDQ), has been serving up regular doses of information quality disasters from around the world.

IAIDQ submitted AmazonFail - A Classic Information Quality Impact that looks behind the hype and confusion surrounding the #amazonfail debacle.

You’re a Leader - Lead

Daragh O Brien is an Irish information quality expert, conference speaker, published author in the field, and director of publicity for the IAIDQ. His excellent blog The DOBlog, founded in 2006, was one of the first specialist information quality blogs.

Daragh submitted You’re a Leader - Lead that explains although there’s a whole lot of great management happening in the world, what we really need are information quality leaders.

All I Really Need To Know About Data Quality I Learned In Kindergarten

My name is Jim Harris. I am an independent consultant, speaker, writer and blogger with over 15 years of professional services and application development experience in data quality. My blog Obsessive-Compulsive Data Quality is an independent blog offering a vendor-neutral perspective on data quality.

I submitted All I Really Need To Know About Data Quality I Learned In Kindergarten that explains how show and tell, the five second rule and other great lessons from kindergarten are essential to success in data quality initiatives.

Submit to Daragh

The May issue will be edited by Daragh O Brien and hosted on The DOBlog.

For more information, please follow this link: El Festival del IDQ Bloggers

April 11, 2009

Enterprise Data World 2009

April 11, 2009/ Jim Harris

Formerly known as the DAMA International Symposium and Wilshire MetaData Conference, Enterprise Data World 2009 was held April 5-9 in Tampa, Florida at the Tampa Convention Center.

Enterprise Data World is the business world’s most comprehensive vendor-neutral educational event about data and information management. This year’s program was bigger than ever before, with more sessions, more case studies, and more can’t-miss content. With 200 hours of in-depth tutorials, hands-on workshops, practical sessions and insightful keynotes, the conference was a tremendous success. Congratulations and thanks to Tony Shaw, Maya Stosskopf and the entire Wilshire staff.

I attended Enterprise Data World 2009 as a member of the Iowa Chapter of DAMA and as a Data Quality Journalist for the International Association for Information and Data Quality (IAIDQ).

I used Twitter to provide live reporting from the sessions that I was attending.

I wish that I could have attended every session, but here are some highlights from ten of my favorites:

8 Ways Data is Changing Everything

Keynote by Stephen Baker from BusinessWeek.

His article Math Will Rock Your World inspired his excellent book The Numerati. Additionally, check out his blog: Blogspotting.

Quotes from the keynote:

"Data is changing how we understand ourselves and how we understand our world"
"Predictive data mining is about the mathematical modeling of humanity"
"Anthropologists are looking at social networking (e.g. Twitter, Facebook) to understand the science of friendship"

Master Data Management: Proven Architectures, Products and Best Practices

Tutorial by David Loshin from Knowledge Integrity.

Included material from his excellent book Master Data Management. Additionally, check out his blog: David Loshin.

Quotes from the tutorial:

"Master Data are the core business objects used in the different applications across the organization, along with their associated metadata, attributes, definitions, roles, connections and taxonomies"
"Master Data Management (MDM) provides a unified view of core data subject areas (e.g. Customers, Products)"
"With MDM, it is important not to over-invest and under-implement - invest in and implement only what you need"

Master Data Management: Ignore the Hype and Keep the Focus on Data

Case Study by Tony Fisher from DataFlux and Jeff Grayson from Equinox Fitness.

Quotes from the case study:

"The most important thing about Master Data Management (MDM) is improving business processes"
"80% of any enterprise implementation should be the testing phase"
"MDM Data Quality (DQ) Challenge: Any % wrong means you’re 100% certain you’re not always right"
"MDM DQ Solution: Re-design applications to ensure the ‘front-door’ protects data quality"
"Technology is critical, however thinking through the operational processes is more important"

A Case of Usage: Working with Use Cases on Data-Centric Projects

Case Study by Susan Burk from IBM.

Quotes from the case study:

"Use Case is a sequence of actions performed to yield a result of observable business value"
"The primary focus of data-centric projects is data structure, data delivery and data quality"
"Don’t like use cases? – ok, call them business acceptance criteria – because that’s what a use case is"

Crowdsourcing: People are Smart, When Computers are Not

Session by Sharon Chiarella from Amazon Web Services.

Quotes from the session:

"Crowdsourcing is outsourcing a task typically performed by employees to a general community of people"
"Crowdsourcing eliminates over-staffing, lowers costs and reduces work turnaround time"
"An excellent example of crowdsourcing is open source software development (e.g. Linux)"

Improving Information Quality using Lean Six Sigma Methodology

Session by Atul Borkar and Guillermo Rueda from Intel.

Quotes from the session:

"Information Quality requires a structured methodology in order to be successful"
Lean Six Sigma Framework: DMAIC – Define, Measure, Analyze, Improve, Control:
- Define = Describe the challenge, goal, process and customer requirements
- Measure = Gather data about the challenge and the process
- Analyze = Use hypothesis and data to find root causes
- Improve = Develop, implement and refine solutions
- Control = Plan for stability and measurement

Universal Data Quality: The Key to Deriving Business Value from Corporate Data

Session by Stefanos Damianakis from Netrics.

Quotes from the session:

"The information stored in databases is NEVER perfect, consistent and complete – and it never can be!"
"Gartner reports that 25% of critical data within large businesses is somehow inaccurate or incomplete"
"Gartner reports that 50% of implementations fail due to lack of attention to data quality issues"
"A powerful approach to data matching is the mathematical modeling of human decision making"
"The greatest advantage of mathematical modeling is that there are no data matching rules to build and maintain"

Defining a Balanced Scorecard for Data Management

Seminar by C. Lwanga Yonke, a founding member of the International Association for Information and Data Quality (IAIDQ).

Quotes from the seminar:

"Entering the same data multiple times is like paying the same invoice multiple times"
"Good metrics help start conversations and turn strategy into action"
Good metrics have the following characteristics:
- Business Relevance
- Clarity of Definition
- Trending Capability (i.e. metric can be tracked over time)
- Easy to aggregate and roll-up to a summary
- Easy to drill-down to the details that comprised the measurement

Closing Panel: Data Management’s Next Big Thing!

Quotes from Panelist Peter Aiken from Data Blueprint:

Capability Maturity Levels:
1. Initial
2. Repeatable
3. Defined
4. Managed
5. Optimized
"Most companies are at a capability maturity level of (1) Initial or (2) Repeatable"
"Data should be treated as a durable asset"

Quotes from Panelist Noreen Kendle from Burton Group:

"A new age for data and data management is on horizon – a perfect storm is coming"
"The perfect storm is being caused by massive data growth and software as a service (i.e. cloud computing)"
"Always remember that you can make lemonade from lemons – the bad in life can be turned into something good"

Quotes from Panelist Karen Lopez from InfoAdvisors:

"If you keep using the same recipe, then you keep getting the same results"
"Our biggest problem is not technical in nature - we simply need to share our knowledge"
"Don’t be a dinosaur! Adopt a ‘go with what is’ philosophy and embrace the future!"

Quotes from Panelist Eric Miller from Zepheira:

"Applications should not be ON The Web, but OF The Web"
"New Acronym: LED – Linked Enterprise Data"
"Semantic Web is the HTML of DATA"

Quotes from Panelist Daniel Moody from University of Twente:

"Unified Modeling Language (UML) was the last big thing in software engineering"
"The next big thing will be ArchiMate, which is a unified language for enterprise architecture modeling"

Mark Your Calendar

Enterprise Data World 2010 will take place in San Francisco, California at the Hilton San Francisco on March 14-18, 2010.

April 01, 2009

Data Quality Whitepapers are Worthless

April 01, 2009/ Jim Harris

During a 1609 interview, William Shakespeare was asked his opinion about an emerging genre of theatrical writing known as Data Quality Whitepapers. The "Bard of Avon" was clearly not a fan. His famously satirical response was:

Data quality's but a writing shadow, a poor paper

That struts and frets its words upon the page

And then is heard no more: it is a tale

Told by a vendor, full of sound and fury

Signifying nothing.

Four centuries later, I find myself in complete agreement with Shakespeare (and not just because Harold Bloom told me so).

Today is April Fool's Day, but I am not joking around - call Dennis Miller and Lewis Black - because I am ready to RANT.

I am sick and tired of reading whitepapers. Here is my "Bottom Ten List" explaining why:

Ones that make me fill out a "please mercilessly spam me later" contact information form before I am allowed to download them remind me of Mrs. Bun: "I DON'T LIKE SPAM!"
Ones that after I read their supposed pearls of wisdom, make me shake my laptop violently like an Etch-A-Sketch. I have lost count of how many laptops I have destroyed this way. I have starting buying them in bulk at Wal-Mart.
Ones comprised entirely of the exact same information found on the vendor's website make www = World Wide Worthless.
Ones that start out good, but just when they get to the really useful stuff, refer to content only available to paying customers. What a great way to guarantee that neither I nor anyone I know will ever become your paying customer!
Ones that have a "Shock and Awe" title followed by "Aw Shucks" content because apparently the entire marketing budget was spent on the title.
Ones that promise me the latest BUZZ but deliver only ZZZ are not worthless only when I have insomnia.
Ones that claim to be about data quality, but have nothing at all to do with data quality: "...don't make me angry. You wouldn't like me when I'm angry."
Ones that take the adage "a picture is worth a thousand words" too far by using a dizzying collage of logos, charts, graphs and other visual aids. This is one reason we're happy that Pablo Picasso was a painter. However, he did once write that "art is a lie that makes us realize the truth." Maybe he was defending whitepapers.
Ones that use acronyms without ever defining what they stand for remind me of that scene from Good Morning, Vietnam: "Excuse me, sir. Seeing as how the VP is such a VIP, shouldn't we keep the PC on the QT? Because if it leaks to the VC he could end up MIA, and then we'd all be put out in KP."
Ones that really know they're worthless but aren't honest about it. Don't promise me "The Top 10 Metrics for Data Quality Scorecards" and give me a list as pointless as this one.

I am officially calling out all writers of Data Quality Whitepapers.

Shakespeare and I both believe that you can't write anything about data quality that is worth reading.

Send your data quality whitepapers to Obsessive-Compulsive Data Quality and if it is not worthless, then I will let the world know that you proved Shakespeare and I wrong.

And while I am on a rant roll, I am officially calling out all Data Quality Bloggers.

The International Association for Information and Data Quality (IAIDQ) is celebrating its five year anniversary by hosting:

El Festival del IDQ Bloggers – A Blog Carnival for Information/Data Quality Bloggers

For more information about the blog carnival, please follow this link: IAIDQ Blog Carnival

March 13, 2009

Do you have obsessive-compulsive data quality (OCDQ)?

March 13, 2009/ Jim Harris

Obsessive-compulsive data quality (OCDQ) affects millions of people worldwide.

The most common symptoms of OCDQ are:

Obsessively verifying data used in critical business decisions
Compulsively seeking an understanding of data in business terms
Repeatedly checking that data is complete and accurate before sharing it
Habitually attempting to calculate the cost of poor data quality
Constantly muttering a mantra that data quality must be taken seriously

While the good folks at Prescott Pharmaceuticals are busy working on a treatment, I am dedicating this independent blog as group therapy to all those who (like me) have dealt with OCDQ their entire professional lives.

Over the years, the work of many individuals and organizations has been immensely helpful to those of us with OCDQ.

Some of these heroes deserve special recognition:

Data Quality Pro – Founded and maintained by Dylan Jones, Data Quality Pro is a free independent community resource dedicated to helping data quality professionals take their career or business to the next level. With the mission to create the most beneficial data quality resource that is freely available to members around the world, Data Quality Pro provides free software, job listings, advice, tutorials, news, views and forums. Their goal is "winning-by-sharing” and they believe that by contributing a small amount of their experience, skill or time to support other members then truly great things can be achieved. With the new Member Service Register, consultants, service providers and technology vendors can promote their services and include links to their websites and blogs.

International Association for Information and Data Quality (IAIDQ) – Chartered in January 2004, IAIDQ is a not-for-profit, vendor-neutral professional association whose purpose is to create a world-wide community of people who desire to reduce the high costs of low quality information and data by applying sound quality management principles to the processes that create, maintain and deliver data and information. IAIDQ was co-founded by Larry English and Tom Redman, who are two of the most respected and well-known thought and practice leaders in the field of information and data quality.IAIDQ also provides two excellent blogs: IQ Trainwrecks and Certified Information Quality Professional (CIQP).

Beth Breidenbach – her blog Confessions of a database geek is fantastic in and of itself, but she has also compiled an excellent list of data quality blogs and provides them via aggregated feeds in both Feedburner and Google Reader formats.

Vincent McBurney – his blog Tooling Around in the IBM InfoSphere is an entertaining and informative look at data integration in the IBM InfoSphere covering many IBM Information Server products such as DataStage, QualityStage and Information Analyzer.

Daragh O Brien – is a leading writer, presenter and researcher in the field of information quality management, with a particular interest in legal aspects of information quality. His blog The DOBlog is a popular and entertaining source of great material.

Steve Sarsfield – his blog Data Governance and Data Quality Insider covers the world of data integration, data governance, and data quality from the perspective of an industry insider. Also, check out his new book: The Data Governance Imperative.

OCDQ Blog

Taking the first IQCP exam

Related Posts

Popular OCDQ Radio Episodes

Recommended Data Quality Books

Popular OCDQ Radio Episodes

Related Posts

Definition Drift

Lemonade Stand Data Quality

What’s In a Given Name?

Solvency II Standards for Data Quality

How Accuracy Has Changed

Uniqueness is in the Eye of the Beholder

Uniqueness in the Eye of the NSTIC

Profound Profiling

Wanted: a Data Quality Standard for Open Government Data

Data Quality Disasters in the Social Media Age

Finding Data Quality

Editor’s Selections

Check out the past issues of El Festival del IDQ Bloggers

Henrik Liliendahl Sørensen

Dylan Jones

Julian Schwarzenbach

Rich Murnane

Phil Wright

Initiate – an IBM Company

Baseline Consulting

DataFlux – a SAS Company

Related Posts

Additional Resources

IAIDQ

Data Quality Expert

The Times They Are a-Changin'

Share Your Perspectives

Related Posts

Additional IAIDQ Links

Dylan Jones

Daragh O Brien

Steve Sarsfield

Daniel Gent

Henrik Liliendahl Sørensen

Stefanos Damianakis

Vish Agashe

Mark Goloboy

Additional Resources

BI from Both Sides: Aligning Business and IT

TDWI Data Governance Summit

Data Quality Assessment - Practical Skills

Modern Data Quality Techniques in Action - A Demonstration Using Human Resources Data

Mark Your Calendar

Can You Say Anything Interesting About Data Quality?

Submit to Daragh

8 Ways Data is Changing Everything

Master Data Management: Proven Architectures, Products and Best Practices

Master Data Management: Ignore the Hype and Keep the Focus on Data

A Case of Usage: Working with Use Cases on Data-Centric Projects

Crowdsourcing: People are Smart, When Computers are Not

Improving Information Quality using Lean Six Sigma Methodology

Universal Data Quality: The Key to Deriving Business Value from Corporate Data

Defining a Balanced Scorecard for Data Management

Closing Panel: Data Management’s Next Big Thing!

Mark Your Calendar

OCDQ Blog