The Data-Information Continuum

Data is one of the enterprise's most important assets.  Data quality is a fundamental success factor for the decision-critical information that drives the tactical and strategic initiatives essential to the enterprise's mission to survive and thrive in today's highly competitive and rapidly evolving marketplace.

When the results of these initiatives don't meet expectations, analysis often reveals poor data quality is a root cause.   Projects are launched to understand and remediate this problem by establishing enterprise-wide data quality standards.

However, a common issue is a lack of understanding about what I refer to as the Data-Information Continuum.

 

The Data-Information Continuum

In physics, the Space-Time Continuum explains that space and time are interrelated entities forming a single continuum.  In classical mechanics, the passage of time can be considered a constant for all observers of spatial objects in motion.  In relativistic contexts, the passage of time is a variable changing for each specific observer of spatial objects in motion.

Data and information are also interrelated entities forming a single continuum.  It is crucial to understand how they are different and how they relate.  I like using the Dragnet definition for data – it is “just the facts” collected as an abstract description of the real-world entities that the enterprise does business with (e.g. customers, vendors, suppliers). 

A common data quality definition is fitness for the purpose of use.  A common challenge is data has multiple uses, each with its own fitness requirements.  I like to view each intended use as the information that is derived from data, defining information as data in use or data in action.

Data could be considered a constant while information is a variable that redefines data for each specific use.  Data is not truly a constant since it is constantly changing.  However, information is still derived from data and many different derivations can be performed while data is in the same state (i.e. before it changes again). 

Quality within the Data-Information Continuum has both objective and subjective dimensions.

 

Objective Data Quality

Data's quality must be objectively measured separate from its many uses.  Enterprise-wide data quality standards must provide a highest common denominator for all business units to use as an objective data foundation for their specific tactical and strategic initiatives.  Raw data extracted directly from its sources must be profiled, analyzed, transformed, cleansed, documented and monitored by data quality processes designed to provide and maintain universal data sources for the enterprise's information needs.  At this phase, the manipulations of raw data by these processes must be limited to objective standards and not be customized for any subjective use.

 

Subjective Information Quality

Information's quality can only be subjectively measured according to its specific use.  Information quality standards are not enterprise-wide, they are customized to a specific business unit or initiative.  However, all business units and initiatives must begin defining their information quality standards by using the enterprise-wide data quality standards as a foundation.  This approach allows leveraging a consistent enterprise understanding of data while also deriving the information necessary for the day-to-day operation of each business unit and initiative.

 

A “Single Version of the Truth” or the “One Lie Strategy”

A common objection to separating quality standards into objective data quality and subjective information quality is the enterprise's significant interest in creating what is commonly referred to as a single version of the truth.

However, in his excellent book Data Driven: Profiting from Your Most Important Business Asset, Thomas Redman explains:

“A fiendishly attractive concept is...'a single version of the truth'...the logic is compelling...unfortunately, there is no single version of the truth. 

For all important data, there are...too many uses, too many viewpoints, and too much nuance for a single version to have any hope of success. 

This does not imply malfeasance on anyone's part; it is simply a fact of life. 

Getting everyone to work from a single version of the truth may be a noble goal, but it is better to call this the 'one lie strategy' than anything resembling truth.”

Conclusion

There is a significant difference between data and information and therefore a significant difference between data quality and information quality.  Many data quality projects are in fact implementations of information quality customized to the specific business unit or initiative that is funding the project.  Although these projects can achieve some initial success, they encounter failures in later iterations and phases when information quality standards try to act as enterprise-wide data quality standards. 

Significant time and money can be wasted by not understanding the Data-Information Continuum.

The Three Musketeers of Data Quality

People, process and technology.  All three are necessary for success on your data quality project.  By far, the most important of the three is people.  However, who exactly are some of the most important people on your data quality project? 

Or to phrase the question in a much more entertaining way...

 

Who are The Three Musketeers of Data Quality?

1. Athos, the Executive Sponsor - Provides the mandate for the Business and IT to forge an ongoing and iterative collaboration throughout the entire project.  You might not see him roaming the halls or sitting in on most of the meetings.  However, Athos provides oversight and arbitrates any issues of organization politics.  Without an executive sponsor, a data quality project can not get very far and can easily lose momentum or focus.  Perhaps most importantly, Athos is also usually the source of the project's funding.

 

2. Porthos, the Project Manager - Facilitates the strategic and tactical collaboration of the project team.  Knowledge about data, business processes and supporting technology are spread throughout your organization.  Neither the Business nor IT alone has all of the necessary information required to achieve data quality success.  Porthos coordinates discussions with all of the stakeholders.  Business users are able to share their knowledge in their natural language and IT users are able to “geek out” in techno-babble.  Porthos clarifies communication by providing bi-directional translation.  He interprets end user business requirements, explains technical challenges and maintains an ongoing dialogue between the Business and IT.  Yes, Porthos is also responsible for the project plan.  But he realizes that project management is more about providing leadership. 

 

3. Aramis, the Subject Matter Expert - Provides detailed knowledge about specific data subject areas and business processes.  Aramis reviews the reports from the data quality assessments and provides feedback based on his understanding of how the data is actually being used.  He helps identify the data most valuable to the business.  Aramis will often be an excellent source for undocumented business rules and can quickly clarify seemingly complex issues based on his data-centric point of view.

 

Alexandre Dumas fans will recall the novel's plucky primary protagonist was an outsider who became the Fourth Musketeer and yes, data quality has one too:

 

4. D'Artagnan, the Data Quality Consultant - Provides extensive experience and best practices from successful data quality implementations.  Most commonly, d'Artagnan is a certified expert with the data quality tool you have selected.  D'Artagnan's goal is to help you customize a technical solution to your specific business needs.  Unlike the Dumas character, your d'Artagnan usually doesn't accept the Musketeer commission at the end of the project.  Therefore, his primary responsibility is to make himself obsolete as quickly as possible by providing mentoring, documentation, training and knowledge transfer. 

 

Your data quality project will typically have more than one person (and obviously not just men) playing each of these classic roles although you may use different job titles.  Additionally, there will be many other important people on your project playing many other key roles, such as data architect, business analyst, application developer and system tester - to name just a few. 

Data quality truly takes a team effort.  Remember that you are all in this together.

So if anyone asks you who is the most important person on your project, then just respond with the Musketeer Motto:

"All for Data Quality, Data Quality for All"


The Two Headed Monster of Data Matching

Data matching is commonly defined as the comparison of two or more records in order to evaluate if they correspond to the same real world entity (i.e. are duplicates) or represent some other data relationship (e.g. a family household).

Data matching is commonly plagued by what I refer to as The Two Headed Monster:

  • False Negatives - records that did not match, but should have been matched
  • False Positives - records that matched, but should not have been matched

 

I Fought The Two Headed Monster...

On a recent (mostly) business trip to Las Vegas, I scheduled a face-to-face meeting with a potential business partner that I had previously communicated with via phone and email only.  We agreed to a dinner meeting at a restaurant in the hotel/casino where I was staying. 

I would be meeting with the President/CEO and the Vice President of Business Development, a man and a woman respectively.

I was facing a real world data matching problem.

I knew their names, but I had no idea what they looked like.  Checking their company website and LinkedIn profiles didn't help - no photos.  I neglected to get their mobile phone numbers, however they had mine.

The restaurant was inside the casino and the only entrance was adjacent to a Starbucks that had tables and chairs facing the casino floor.  I decided to arrive at the restaurant 15 minutes early and camp out at Starbucks since anyone going near the restaurant would have to walk right past me.

I was more concerned about avoiding false positives.  I didn't want to walk up to every potential match and introduce myself since casino security would soon intervene (and I have seen enough movies to know that scene always ends badly). 

I decided to apply some probabilistic data matching principles to evaluate the mass of humanity flowing past me. 

If some of my matching criteria seems odd, please remember I was in a Las Vegas casino. 

I excluded from consideration all:

  • Individuals wearing a uniform or a costume
  • Groups consisting of more than two people
  • Groups consisting of two men or two women
  • Couples carrying shopping bags or souvenirs
  • Couples demonstrating a public display of affection
  • Couples where one or both were noticeably intoxicated
  • Couples where one or both were scantily clad
  • Couples where one or both seemed too young or too old

I carefully considered any:

  • Couples dressed in business attire or business casual attire
  • Couples pausing to wait at the restaurant entrance
  • Couples arriving close to the scheduled meeting time

I was quite pleased with myself for applying probabilistic data matching principles to a real world situation.

However, the scheduled meeting time passed.  At first, I simply assumed they might be running a little late or were delayed by traffic.  As the minutes continued to pass, I started questioning my matching criteria.

 

...And The Two Headed Monster Won

When the clock reached 30 minutes past the scheduled meeting time, my mobile phone rang.  My dinner companions were calling to ask if I was running late.  They had arrived on time, were inside the restaurant, and had already ordered.

Confused, I entered the restaurant.  Sure enough, there sat a man and a woman that had walked right past me.  I excluded them from consideration because of how they were dressed.  The Vice President of Business Development was dressed in jeans, sneakers and a casual shirt.  The President/CEO was wearing shorts, sneakers and a casual shirt.

I had dismissed them as a vacationing couple.

I had been defeated by a false negative.

 

The Harsh Reality is that Monsters are Real

My data quality expertise could not guarantee victory in this particular battle with The Two Headed Monster. 

Monsters are real and the hero of the story doesn't always win.

And it doesn’t matter if the match algorithms I use are deterministic, probabilistic, or even supercalifragilistic. 

The harsh reality is that false negatives and false positives can be reduced, but never eliminated.

 

Are You Fighting The Two Headed Monster?

Are you more concerned about false negatives or false positives?  Please share your battles with The Two Headed Monster.

 

Related Articles

Back in February and March, I published a five part series of articles on data matching methodology on Data Quality Pro

Parts 2 and 3 of the series provided data examples to illustrate the challenge of false negatives and false positives within the context of identifying duplicate customers:

The Nine Circles of Data Quality Hell

“Abandon all hope, ye who enter here.” 

In Dante’s Inferno, these words are inscribed above the entrance into hell.  The Roman poet Virgil was Dante’s guide through its nine circles, each an allegory for unrepentant sins beyond forgiveness.

The Very Model of a Modern DQ General will be your guide on this journey through nine of the most common mistakes that can doom your data quality project:

 

1. Thinking data quality is an IT issue (or a business issue) - Data quality is not an IT issue.  Data quality is also not a business issue.  Data quality is everyone's issue.  Successful data quality projects are driven by an executive management mandate for the business and IT to forge an ongoing and iterative collaboration throughout the entire project.  The business usually owns the data and understands its meaning and use in the day to day operation of the enterprise and must partner with IT in defining the necessary data quality standards and processes.

 

2. Waiting for poor data quality to affect you - Data quality projects are often launched in the aftermath of an event when poor data quality negatively impacted decision-critical enterprise information.  Some examples include a customer service nightmare, a regulatory compliance failure or a financial reporting scandal.  Whatever the triggering event, a common response is data quality suddenly becomes prioritized as a critical issue.

 

3. Believing technology alone is the solution - Although incredible advancements continue, technology alone cannot provide the solution.  Data quality requires a holistic approach involving people, process and technology.  Your project can only be successful when people take on the challenge united by collaboration, guided by an effective methodology, and of course, implemented with amazing technology.

 

4. Listening only to the expert - An expert can be an invaluable member of the data quality project team.  However, sometimes an expert can dominate the decision making process.  The expert's perspective needs to be combined with the diversity of the entire project team in order for success to be possible.

 

5. Losing focus on the data - The complexity of your data quality project can sometimes work against your best intentions.  It is easy to get pulled into the mechanics of documenting the business requirements and functional specifications and then charging ahead with application development.  Once the project achieves some momentum, it can take on a life of its own and the focus becomes more and more about making progress against the tasks in the project plan, and less and less on the project's actual goal, which is to improve the quality of your data.

  • This common mistake was the theme of my post: Data Gazers.

 

6. Chasing perfection - An obsessive-compulsive quest to find and fix every data quality problem is a laudable pursuit but ultimately a self-defeating cause.  Data quality problems can be very insidious and even the best data quality process will still produce exceptions.  Although this is easy to accept in theory, it is notoriously difficult to accept in practice.  Do not let the pursuit of perfection undermine your data quality project.

 

7. Viewing your data quality assessment as a one-time event - Your data quality project should begin with a data quality assessment to assist with aligning perception with reality and to get the project off to a good start by providing a clear direction and a working definition of success.  However, the data quality assessment is not a one-time event that ends when development begins.  You should perform iterative data quality assessments throughout the entire development lifecycle.

 

8. Forgetting about the people - People, process and technology.  All three are necessary for success on your data quality project.  However, I have found that the easiest one to forget about (and by far the most important of the three) is people.

 

9. Assuming if you build it, data quality will come - There are many important considerations when planning a data quality project.  One of the most important is to realize that data quality problems cannot be permanently “fixed" by implementing a one-time "solution" that doesn't require ongoing improvements.

 

Knowing these common mistakes is no guarantee that your data quality project couldn't still find itself lost in a dark wood.

However, knowledge could help you realize when you have strayed from the right road and light a path to find your way back.

Schrödinger's Data Quality

In 1935, Austrian physicist Erwin Schrödinger described a now famous thought experiment where:

  “A cat, a flask containing poison, a tiny bit of radioactive substance and a Geiger counter are placed into a sealed box for one hour.  If the Geiger counter doesn't detect radiation, then nothing happens and the cat lives.  However if radiation is detected, then the flask is shattered, releasing the poison which kills the cat.  According to the Copenhagen interpretation of quantum mechanics, until the box is opened, the cat is simultaneously alive and dead.  Yet, once you open the box, the cat will either be alive or dead, not a mixture of alive and dead.” 

This was only a thought experiment.  Therefore, no actual cat was harmed. 

This paradox of quantum physics, known as Schrödinger's Cat, poses the question:

  “When does a quantum system stop existing as a mixture of states and become one or the other?”

 

Unfortunately, data quality projects are not thought experiments.  They are complex, time consuming and expensive enterprise initiatives.  Typically, a data quality tool is purchased, expert consultants are hired to supplement staffing, production data is copied to a development server and the project begins.  Until it is completed and the new system goes live, the project is a potential success or failure.  Yet, once the new system starts being used, the project will become either a success or failure.

This paradox, which I refer to as Schrödinger's Data Quality, poses the question:

  “When does a data quality project stop existing as potential success or failure and become one or the other?”

 

Data quality projects should begin with the parallel and complementary efforts of drafting the business requirements while also performing a data quality assessment, which can help you:

  • Verify data matches the metadata that describes it
  • Identify potential missing, invalid and default values
  • Prepare meaningful questions for subject matter experts
  • Understand how data is being used
  • Prioritize critical data errors
  • Evaluate potential ROI of data quality improvements
  • Define data quality standards
  • Reveal undocumented business rules
  • Review and refine the business requirements
  • Provide realistic estimates for development, testing and implementation

Therefore, the data quality assessment assists with aligning perception with reality and gets the project off to a good start by providing a clear direction and a working definition of success.

 

However, a common mistake is to view the data quality assessment as a one-time event that ends when development begins. 

 

Projects should perform iterative data quality assessments throughout the entire development lifecycle, which can help you:

  • Gain a data-centric view of the project's overall progress
  • Build data quality monitoring functionality into the new system
  • Promote data-driven development
  • Enable more effective unit testing
  • Perform impact analysis on requested enhancements (i.e. scope creep)
  • Record regression cases for testing modifications
  • Identify data exceptions that require suspension for manual review and correction
  • Facilitate early feedback from the user community
  • Correct problems that could undermine user acceptance
  • Increase user confidence that the new system will meet their needs

 

If you wait until the end of the project to learn if you have succeeded or failed, then you treat data quality like a game of chance.

And to paraphrase Albert Einstein:

  “Do not play dice with data quality.”


Data Gazers

The Matrix Within cubicles randomly dispersed throughout the sprawling office space of companies large and small, there exist countless unsung heroes of enterprise information initiatives.  Although their job titles might be labeling them as a Business Analyst, Programmer Analyst, Account Specialist or Application Developer, their true vocation is a far more noble calling.

 

They are Data Gazers.

 

In his excellent book Data Quality Assessment, Arkady Maydanchik explains that:

"Data gazing involves looking at the data and trying to reconstruct a story behind these data.  Following the real story helps identify parameters about what might or might not have happened and how to design data quality rules to verify these parameters.  Data gazing mostly uses deduction and common sense."

All enterprise information initiatives are complex endeavors and data quality projects are certainly no exception.  Success requires people taking on the challenge united by collaboration, guided by an effective methodology, and implementing a solution using powerful technology.

But the complexity of the project can sometimes work against your best intentions.  It is easy to get pulled into the mechanics of documenting the business requirements and functional specifications and then charging ahead on the common mantra:

"We planned the work, now we work the plan." 

Once the project achieves some momentum, it can take on a life of its own and the focus becomes more and more about making progress against the tasks in the project plan, and less and less on the project's actual goal...improving the quality of the data. 

In fact, I have often observed the bizarre phenomenon where as a project "progresses" it tends to get further and further away from the people who use the data on a daily basis.

 

However, Arkady Maydanchik explains that:

"Nobody knows the data better than the users.  Unknown to the big bosses, the people in the trenches are measuring data quality every day.  And while they rarely can give a comprehensive picture, each one of them has encountered certain data problems and developed standard routines to look for them.  Talking to the users never fails to yield otherwise unknown data quality rules with many data errors."

There is a general tendency to consider that working directly with the users and the data during application development can only be disruptive to the project's progress.  There can be a quiet comfort and joy in simply developing off of documentation and letting the interaction with the users and the data wait until the project plan indicates that user acceptance testing begins. 

The project team can convince themselves that the documented business requirements and functional specifications are suitable surrogates for the direct knowledge of the data that users possess.  It is easy to believe that these documents tell you what the data is and what the rules are for improving the quality of the data.

Therefore, although ignoring the users and the data until user acceptance testing begins may be a good way to keep a data quality project on schedule, you will only be delaying the project's inevitable failure because as all data gazers know and as my mentor Morpheus taught me:

"Unfortunately, no one can be told what the Data is. You have to see it for yourself."


Data Quality is People!

 

Solyent Green New York City, 2022 - Technical Architect Robert Thorn and Business Analyst Sol Roth have been called in by Business Director Harry Harrison and IT Director Tab Fielding to investigate an unsolved series of anomalies that have been plaguing the company's Dystopian Automated Transactional Analysis (DATA) system.

 

Harry Harrison (Business Director):

“Thank you for coming on such short notice.  We hope that you will be able to help us with our DATA problems.”

Robert Thorn (Technical Architect): 

“You're welcome.”

Sol Roth (Business Analyst): 

“I am sure that we can help.  Can you provide us with an overview of the situation?”

Harry Harrison (Business Director): 

“We hired quality expert William Simonson from Soylent Consulting to fix our DATA problems.”

Robert Thorn (Technical Architect): 

“Soylent Consulting?  Never heard of them.”

Sol Roth (Business Analyst): 

“They are the professional services division of Green, Incorporated.”

Harry Harrison (Business Director): 

“Yes, that's right - the High-Energy, Environmentally-Minded Corporation”

Tab Fielding (IT Director): 

“More like the High-Rate, Weak-Minded Corporation, if you ask me.”

Harry Harrison (Business Director): 

“Well anyway, Mr. Simonson first met with me to receive the business requirements.”

Sol Roth (Business Analyst): 

“Receive?  You mean he was handed the completed business requirements document?”

Harry Harrison (Business Director): 

“Yes, of course.”

Sol Roth (Business Analyst): 

“So...he didn't meet directly with anyone on the business team?”

Harry Harrison (Business Director): 

“No, I write the business requirements document so that meeting with the business team is unnecessary.”

Sol Roth (Business Analyst): 

“O...K...then what happened?”

Tab Fielding (IT Director): 

“Mr. Simonson met with me to receive the functional specifications.”

Robert Thorn (Technical Architect): 

“Receive?  So...did you write the functional specifications?”

Tab Fielding (IT Director): 

“Yes, I write the functional specifications after reading Mr. Harrison's business requirements document.”

Robert Thorn (Technical Architect): 

“Reading?  So...you didn't even meet directly with Mr. Harrison?”

Tab Fielding (IT Director): 

“No, before today's meeting, I haven't even seen him or anyone from the business team in months.”

Robert Thorn (Technical Architect): 

“O...K...then what happened?”

Tab Fielding (IT Director): 

“Mr. Simonson spent a few months coding, implemented our solution, then told us he was ‘going home.’”

Harry Harrison (Business Director): 

“And now our DATA is worse than ever!”

Sol Roth (Business Analyst): 

“And both of you are wondering how it came to this?”

Harry Harrison (Business Director) and Tab Fielding (IT Director): 

“Yes!”

Robert Thorn (Technical Architect): 

“I'll tell you how.  Because you both forgot the most important aspect of DATA.”

Tab Fielding (IT Director): 

“What are you talking about?”

Robert Thorn (Technical Architect): 

“It's people.  Data's Quality is made by People.  You've gotta tell them.  You've gotta tell them!”

Harry Harrison (Business Director): 

“I promise, Thorn.  I promise.  We will tell executive management.”

Robert Thorn (Technical Architect): 

“You tell everybody.  Listen to me, both of you!  You've gotta tell everybody that Data Quality is People!


TDWI World Conference Chicago 2009

Founded in 1995, TDWI (The Data Warehousing Institute™) is the premier educational institute for business intelligence and data warehousing that provides education, training, certification, news, and research for executives and information technology professionals worldwide.  TDWI conferences always offer a variety of full-day and half-day courses taught in an objective, vendor-neutral manner.  The courses taught are designed for professionals and taught by in-the-trenches practitioners who are well known in the industry.

 

TDWI World Conference Chicago 2009 was held May 3-8 in Chicago, Illinois at the Hyatt Regency Hotel and was a tremendous success.  I attended as a Data Quality Journalist for the International Association for Information and Data Quality (IAIDQ).

I used Twitter to provide live reporting from the conference.  Here are my notes from the courses I attended: 

 

BI from Both Sides: Aligning Business and IT

Jill Dyché, CBIP, is a partner and co-founder of Baseline Consulting, a management and technology consulting firm that provides data integration and business analytics services.  Jill is responsible for delivering industry and client advisory services, is a frequent lecturer and writer on the business value of IT, and writes the excellent Inside the Biz blog.  She is the author of acclaimed books on the business value of information: e-Data: Turning Data Into Information With Data Warehousing and The CRM Handbook: A Business Guide to Customer Relationship Management.  Her latest book, written with Evan Levy, is Customer Data Integration: Reaching a Single Version of the Truth.

Course Quotes from Jill Dyché:

  • Five Critical Success Factors for Business Intelligence (BI):
    1. Organization - Build organizational structures and skills to foster a sustainable program
    2. Processes - Align both business and IT development processes that facilitate delivery of ongoing business value
    3. Technology - Select and build technologies that deploy information cost-effectively
    4. Strategy - Align information solutions to the company's strategic goals and objectives
    5. Information - Treat data as an asset by separating data management from technology implementation
  • Three Different Requirement Categories:
    1. What is the business need, pain, or problem?  What business questions do we need to answer?
    2. What data is necessary to answer those business questions?
    3. How do we need to use the resulting information to answer those business questions?
  • “Data warehouses are used to make business decisions based on data – so data quality is critical”
  • “Even companies with mature enterprise data warehouses still have data silos - each business area has its own data mart”
  • “Instead of pushing a business intelligence tool, just try to get people to start using data”
  • “Deliver a usable system that is valuable to the business and not just a big box full of data”

 

TDWI Data Governance Summit

Philip Russom is the Senior Manager of Research and Services at TDWI, where he oversees many of TDWI’s research-oriented publications, services, and events.  Prior to joining TDWI in 2005, he was an industry analyst covering BI at Forrester Research, as well as a contributing editor with Intelligent Enterprise and Information Management (formerly DM Review) magazines.

Summit Quotes from Philip Russom:

  • “Data Governance usually boils down to some form of control for data and its usage”
  • “Four Ps of Data Governance: People, Policies, Procedures, Process”
  • “Three Pillars of Data Governance: Compliance, Business Transformation, Business Integration”
  • “Two Foundations of Data Governance: Business Initiatives and Data Management Practices”
  • “Cross-functional collaboration is a requirement for successful Data Governance”

 

Becky Briggs, CBIP, CMQ/OE, is a Senior Manager and Data Steward for Airlines Reporting Corporation (ARC) and has 25 years of experience in data processing and IT - the last 9 in data warehousing and BI.  She leads the program team responsible for product, project, and quality management, business line performance management, and data governance/stewardship.

Summit Quotes from Becky Briggs:

  • “Data Governance is the act of managing the organization's data assets in a way that promotes business value, integrity, usability, security and consistency across the company”
  • Five Steps of Data Governance:
    1. Determine what data is required
    2. Evaluate potential data sources (internal and external)
    3. Perform data profiling and analysis on data sources
    4. Data Services - Definition, modeling, mapping, quality, integration, monitoring
    5. Data Stewardship - Classification, access requirements, archiving guidelines
  • “You must realize and accept that Data Governance is a program and not just a project”

 

Barbara Shelby is a Senior Software Engineer for IBM with over 25 years of experience holding positions of technical specialist, consultant, and line management.  Her global management and leadership positions encompassed network authentication, authorization application development, corporate business systems data architecture, and database development.

Summit Quotes from Barbara Shelby:

  • Four Common Barriers to Data Governance:
    1. Information - Existence of information silos and inconsistent data meanings
    2. Organization - Lack of end-to-end data ownership and organization cultural challenges
    3. Skill - Difficulty shifting resources from operational to transformational initiatives
    4. Technology - Business data locked in large applications and slow deployment of new technology
  • Four Key Decision Making Bodies for Data Governance:
    1. Enterprise Integration Team - Oversees the execution of CIO funded cross enterprise initiatives
    2. Integrated Enterprise Assessment - Responsible for the success of transformational initiatives
    3. Integrated Portfolio Management Team - Responsible for making ongoing business investment decisions
    4. Unit Architecture Review - Responsible for the IT architecture compliance of business unit solutions

 

Lee Doss is a Senior IT Architect for IBM with over 25 years of information technology experience.  He has a patent for process of aligning strategic capability for business transformation and he has held various positions including strategy, design, development, and customer support for IBM networking software products.

Summit Quotes from Lee Doss:

  • Five Data Governance Best Practices:
    1. Create a sense of urgency that the organization can rally around
    2. Start small, grow fast...pick a few visible areas to set an example
    3. Sunset legacy systems (application, data, tools) as new ones are deployed
    4. Recognize the importance of organization culture…this will make or break you
    5. Always, always, always – Listen to your customers

 

Kevin Kramer is a Senior Vice President and Director of Enterprise Sales for UMB Bank and is responsible for development of sales strategy, sales tool development, and implementation of enterprise-wide sales initiatives.

Summit Quotes from Kevin Kramer:

  • “Without Data Governance, multiple sources of customer information can produce multiple versions of the truth”
  • “Data Governance helps break down organizational silos and shares customer data as an enterprise asset”
  • “Data Governance provides a roadmap that translates into best practices throughout the entire enterprise”

 

Kanon Cozad is a Senior Vice President and Director of Application Development for UMB Bank and is responsible for overall technical architecture strategy and oversees information integration activities.

Summit Quotes from Kanon Cozad:

  • “Data Governance identifies business process priorities and then translates them into enabling technology”
  • “Data Governance provides direction and Data Stewardship puts direction into action”
  • “Data Stewardship identifies and prioritizes applications and data for consolidation and improvement”

 

Jill Dyché, CBIP, is a partner and co-founder of Baseline Consulting, a management and technology consulting firm that provides data integration and business analytics services.  (For Jill's complete bio, please see above).

Summit Quotes from Jill Dyché:

  • “The hard part of Data Governance is the data
  • “No data will be formally sanctioned unless it meets a business need”
  • “Data Governance focuses on policies and strategic alignment”
  • “Data Management focuses on translating defined polices into executable actions”
  • “Entrench Data Governance in the development environment”
  • “Everything is customer data – even product and financial data”

 

Data Quality Assessment - Practical Skills

Arkady Maydanchik is a co-founder of Data Quality Group, a recognized practitioner, author, and educator in the field of data quality and information integration.  Arkady's data quality methodology and breakthrough ARKISTRA technology were used to provide services to numerous organizations.  Arkady is the author of the excellent book Data Quality Assessment, a frequent speaker at various conferences and seminars, and a contributor to many journals and online publications.  Data quality curriculum by Arkady Maydanchik can be found at eLearningCurve.

Course Quotes from Arkady Maydanchik:

  • “Nothing is worse for data quality than desperately trying to fix it during the last few weeks of an ETL project”
  • “Quality of data after conversion is in direct correlation with the amount of knowledge about actual data”
  • “Data profiling tools do not do data profiling - it is done by data analysts using data profiling tools”
  • “Data Profiling does not answer any questions - it helps us ask meaningful questions”
  • “Data quality is measured by its fitness to the purpose of use – it's essential to understand how data is used”
  • “When data has multiple uses, there must be data quality rules for each specific use”
  • “Effective root cause analysis requires not stopping after the answer to your first question - Keep asking: Why?”
  • “The central product of a Data Quality Assessment is the Data Quality Scorecard”
  • “Data quality scores must be both meaningful to a specific data use and be actionable”
  • “Data quality scores must estimate both the cost of bad data and the ROI of data quality initiatives”

 

Modern Data Quality Techniques in Action - A Demonstration Using Human Resources Data

Gian Di Loreto formed Loreto Services and Technologies in 2004 from the client services division of Arkidata Corporation.  Loreto Services provides data cleansing and integration consulting services to Fortune 500 companies.  Gian is a classically trained scientist - he received his PhD in elementary particle physics from Michigan State University.

Course Quotes from Gian Di Loreto:

  • “Data Quality is rich with theory and concepts – however it is not an academic exercise, it has real business impact”
  • “To do data quality well, you must walk away from the computer and go talk with the people using the data”
  • “Undertaking a data quality initiative demands developing a deeper knowledge of the data and the business”
  • “Some essential data quality rules are ‘hidden’ and can only be discovered by ‘clicking around’ in the data”
  • “Data quality projects are not about systems working together - they are about people working together”
  • “Sometimes, data quality can be ‘good enough’ for source systems but not when integrated with other systems”
  • “Unfortunately, no one seems to care about bad data until they have it”
  • “Data quality projects are only successful when you understand the problem before trying to solve it”

 

Mark Your Calendar

TDWI World Conference San Diego 2009 - August 2-7, 2009.

TDWI World Conference Orlando 2009 - November 1-6, 2009.

TDWI World Conference Las Vegas 2010 - February 21-26, 2010.

El Festival del IDQ Bloggers (April 2009)

image Welcome to the April 2009 issue of El Festival del IDQ Bloggers, which is a blog carnival for information/data quality bloggers being run as part of the celebration of the five year anniversary of the International Association for Information and Data Quality (IAIDQ).

 

A blog carnival is a collection of posts from different blogs on a specific theme that are published across a series of issues.  Anyone can submit a data quality blog post and experience the benefits of extra traffic, networking with other bloggers and discovering interesting posts.  It doesn't matter what type of blog you have as long as the submitted post has a data quality theme. 

El Festival del IDQ Bloggers will run monthly issues April through November 2009.

 

Can You Say Anything Interesting About Data Quality?

This simple question launched the first blog carnival of data quality that ran four issues from late 2007 through early 2008:

Blog Carnival of Data Quality (November 2007)

Blog Carnival of Data Quality (December 2007)

Blog Carnival of Data Quality (January 2008)

Blog Carnival of Data Quality (February 2008)

 

How to give your Data Warehouse a Data Quality Immunity System

Vincent McBurney is a manager for Deloitte consulting in Perth, Australia.  His excellent blog Tooling Around in the IBM InfoSphere looks at the world of data integration software and occasionally wonders what IBM is up to.  His data quality motto: "If it ain’t broke, don't fix it."

Vincent submitted How to give your Data Warehouse a Data Quality Immunity System that discusses how people who obsessively keep bad quality data out of a data warehouse may be making it unhealthy in the long run.

 

Stuck in First Gear

Michele Goetz is a free-lance consultant helping companies make sense of their business through better analysis, marketing best practices, and marketing solutions.  Her excellent blog Intelligent Metrix guides you on your journey from data to metrics to insight to intelligent decisions.  Her blog de-mystifies business intelligence and data management for the business, and helps you bridge the Business-IT gap for better processes and solutions that drive business success.

Michele submitted Stuck in First Gear that discusses the common problem when companies make big investments in enterprise class solutions but only use a portion of the capabilities, which is like driving a Porcshe in first gear.

 

When Bad Data Becomes Acceptable Data

Daniel Gent is a bilingual business analyst experienced with the System Development Life Cycle (SDLC), decision making, change management, database design, data modeling, data quality management, project coordination, and problem resolution.  His excellent blog Data Quality Edge is a grassroots look at data quality for the data quality analyst in the trenches.

Daniel submitted When Bad Data Becomes Acceptable Data that discusses how you need to prioritize bad data and determine when it is acceptable to keep it for now.

 

Customer Value and Sustainable Quality

Daniel Bahula is a strategy and operations improvement professional with an extensive project experience from multinational telco, software development and professional services companies.  His excellent blog DanBahula.net defies a simple definition and is a great example of how it doesn't matter what type of blog you have as long as the submitted post has a data quality theme.

Daniel submitted Customer Value and Sustainable Quality that discusses Six Sigma and its relevance to addressing data quality issues.

 

Data Quality, Entity Resolution, and OFAC Compliance

Bob Barker is the editor of Identity Resolution Daily, which is a corporate blog of Austin, TX-based Infoglide Software strongly dedicated to citizenship, integrity and communication.  The blog has recently been gaining guest bloggers with varying points of view, helping it to become an excellent site for information, dialogue and community.

Bob submitted Data Quality, Entity Resolution, and OFAC Compliance that discusses how entity resolution is different from name matching and traditional data quality.

 

Selecting Data Quality Software

Dylan Jones is the editor of Data Quality Pro, which is the leading data quality online magazine and free independent community resource dedicated to helping data quality professionals take their career or business to the next level.

Dylan submitted Selecting Data Quality Software that discusses how to find the right data quality technology for your needs and your budget.

 

AmazonFail - A Classic Information Quality Impact

Since 2006, IQTrainwrecks.com, which is a community blog provided and administered by the International Association for Information and Data Quality (IAIDQ), has been serving up regular doses of information quality disasters from around the world.

IAIDQ submitted AmazonFail - A Classic Information Quality Impact that looks behind the hype and confusion surrounding the #amazonfail debacle.

 

You’re a Leader - Lead

Daragh O Brien is an Irish information quality expert, conference speaker, published author in the field, and director of publicity for the IAIDQ.  His excellent blog The DOBlog, founded in 2006, was one of the first specialist information quality blogs.

Daragh submitted You’re a Leader - Lead that explains although there’s a whole lot of great management happening in the world, what we really need are information quality leaders.

 

All I Really Need To Know About Data Quality I Learned In Kindergarten

My name is Jim Harris.  I am an independent consultant, speaker, writer and blogger with over 15 years of professional services and application development experience in data quality.  My blog Obsessive-Compulsive Data Quality is an independent blog offering a vendor-neutral perspective on data quality.

I submitted All I Really Need To Know About Data Quality I Learned In Kindergarten that explains how show and tell, the five second rule and other great lessons from kindergarten are essential to success in data quality initiatives.

 

Submit to Daragh

The May issue will be edited by Daragh O Brien and hosted on The DOBlog

For more information, please follow this link:  El Festival del IDQ Bloggers


Hyperactive Data Quality

In economics, the term "flight to quality" describes the aftermath of a financial crisis (e.g. a stock market crash) when people become highly risk-averse and move their money into safer, more reliable investments. 

A similar "flight to data quality" can occur in the aftermath of an event when poor data quality negatively impacted decision-critical enterprise information.  Some examples include a customer service nightmare, a regulatory compliance failure or a financial reporting scandal.  Whatever the triggering event, a common response is data quality suddenly becomes prioritized as a critical issue and an enterprise information initiative is launched.

Congratulations!  You've realized (albeit the hard way) that this "data quality thing" is really important.

Now what are you going to do about it?  How are you going to attempt to actually solve the problem?

In his excellent book Data Driven: Profiting from Your Most Important Business Asset, Thomas Redman uses an excellent analogy called the data quality lake:

"...a lake represents a database and the water therein the data.  The stream, which adds new water, is akin to a business process that creates new data and adds them to the database.  The lake...is polluted, just as the data are dirty.  Two factories pollute the lake.  Likewise, flaws in the business process are creating errors...

One way to address the dirty lake water is to clean it up...by running the water through filters, passing it through specially designed settling tanks, and using chemicals to kill bacteria and adjust pH. 

The alternative is to reduce the pollutant at the point source - the factories. 

The contrast between the two approaches is stark.  In the first, the focus is on the lake; in the second, it is on the stream.  So too with data.  Finding and fixing errors focuses on the database and data that have already been created.  Preventing errors focuses on the business processes and future data."

 

Reactive Data Quality

A "flight to data quality" usually prompts an approach commonly referred to as Reactive Data Quality (i.e. "cleaning the lake" to use Redman's excellent analogy).  The  majority of enterprise information initiatives are reactive.  The focus is typically on finding and fixing the problems with existing data in an operational data store (ODS), enterprise data warehouse (EDW) or other enterprise information repository.  In other words, the focus is on fixing data after it has been extracted from its sources.

An obsessive-compulsive quest to find and fix every data quality problem is a laudable but ultimately unachievable pursuit (even for expert "lake cleaners").  Data quality problems can be very insidious and even the best "lake cleaning" process will still produce exceptions.  Your process should be designed to identify and report exceptions when they occur.  In fact, as a best practice, you should also include the ability to suspend incoming data that contain exceptions for manual review and correction.

However, as Redman cautions: "...the problem with being a good lake cleaner is that life never gets better.  Indeed, it gets worse as more data...conspire to mean there is more work every day."  I tell my clients the only way to guarantee that reactive data quality will be successful is to unplug all the computers so that no one can add new data or modify existing data.

 

Proactive Data Quality

Attempting to prevent data quality problems before they happen is commonly referred to as Proactive Data Quality.  The focus is on preventing errors at the sources where data is entered or received and before it is extracted for use by downstream applications (i.e. "enters the lake").  Redman describes the benefits of proactive data quality with what he calls the Rule of Ten:

"It costs ten times as much to complete a unit of work when the input data are defective (i.e. late, incorrect, missing, etc.) as it does when the input data are perfect."

Proactive data quality advocates implementing improved edit controls on data entry screens, enforcing the data quality clause (you have one, right?) of your service level agreements with external data providers, and understanding the business needs of your enterprise information consumers before you deliver data to them.

Obviously, it is impossible to truly prevent every problem before it happens.  However, the more control that can be enforced where data originates, the better the overall quality will be for enterprise information.

 

Hyperactive Data Quality

Too many enterprise information initiatives fail because they are launched based on a "flight to data quality" response and have the unrealistic perspective that data quality problems can be quickly and easily resolved.  However, just like any complex problem, there is no fast and easy solution for data quality.

In order to be successful, you must combine aspects of both reactive and proactive data quality in order to create an enterprise-wide best practice that I call Hyperactive Data Quality, which will make the responsibility for managing data quality a daily activity for everyone in your organization.

 

Please share your thoughts and experiences.  Is your data quality Reactive, Proactive or Hyperactive?

All I Really Need To Know About Data Quality I Learned In Kindergarten

Robert Fulghum's excellent book All I Really Need to Know I Learned in Kindergarten dominated the New York Times Bestseller List for all of 1989 and much of 1990.  The 15th Anniversary Edition, which was published in 2003, revised and expanded on the original inspirational essays.

A far less noteworthy achievement of the book is that it also inspired me to write about how:

All I Really Need To Know About Data Quality I Learned in Kindergarten

Show And Tell

I loved show and tell.  An opportunity to deliver an interactive presentation that encouraged audience participation.  No PowerPoint slides.  No podium.  No power suit.  Just me wearing the dorky clothes my parents bought me, standing right in front of the class, waving my Millennium Falcon over my head and explaining that "traveling through hyper-space ain't like dustin' crops, boy" while my classmates (and my teacher) were laughing so hard many of them fell out of their seats.  My show and tell made it clear that if you came over my house after school to play, then you knew exactly what to expect - a geek who loved Star Wars - perhaps a little too much. 

When you present the business case for your data quality initiative to executive management and other corporate stakeholders, remember the lessons of show and tell.  Poor data quality is not a theoretical problem - it is a real business problem that negatively impacts the quality of decision critical enterprise information.  Your presentation should make it clear that if the data quality initiative doesn't get approved, then everyone will know exactly what to expect:

"Poor data quality is the path to the dark side. 

Poor data quality leads to bad business decisions. 

Bad business decisions leads to lost revenue. 

Lost revenue leads to suffering."

The Five Second Rule

If you drop your snack on the floor, then as long as you pick it up within five seconds you can safely eat it.  When you have poor quality data in your enterprise systems, you do have more than five seconds to do something about it.  However, the longer poor quality data goes without remediation, the more likely it will negatively impact critical business decisions.  Don't let your data become the "smelly kid" in class.  No one likes to share their snacks with the smelly kid.  And no one trusts information derived from "smelly data."

 

When You Make A Mistake, Say You're Sorry

Nobody's perfect.  We all have bad days.  We all occasionally say and do stupid things.  When you make a mistake, own up to it and apologize for it.  You don't want to have to wear the dunce cap or stand in the corner for a time-out.  And don't be too hard on your friend that had to wear the dunce cap today.  It was simply their turn to make a mistake.  It will probably be your turn tomorrow.  They had to say they were sorry.  You also have to forgive them.  Who else is going to share their cookies with you when your mom once again packs carrots as your snack?

 

Learn Something New Every Day

We didn't stop learning after we "graduated" from kindergarten, did we?  We are all proud of our education, knowledge, understanding, and experience.  It may be true that experience is the path that separates knowledge from wisdom.  However, we must remain open to learning new things.   Socrates taught us that "the only true wisdom consists in knowing that you know nothing."  I bet Socrates headlined the story time circuit in the kindergartens of Ancient Greece.

 

Hold Hands And Stick Together

I remember going on numerous field trips in kindergarten.  We would visit museums, zoos and amusement parks.  Wherever we went, our teacher would always have us form an interconnected group by holding the hand of the person in front of you and the person behind you.  We were told to stick together and look out for one another.  This important lesson is also applicable to data quality initiatives.  Teamwork and collaboration are essential for success.  Remember that you are all in this together.

 

What did you learn about data quality in kindergarten?

A Portrait of the Data Quality Expert as a Young Idiot

Once upon a time (and a very good time it was), there was a young data quality consultant that fancied himself an expert.

 

He went from client to client and project to project, all along espousing his expertise.  He believed he was smarter than everyone else.  He didn't listen well - he simply waited for his turn to speak.  He didn't foster open communication without bias - he believed his ideas were the only ones of value.  He didn't seek mutual understanding on difficult issues - he bullied people until he got his way.  He didn't believe in the importance of the people involved in the project - he believed the project would be successful with or without them.

 

He was certain he was always right.

 

And he failed - many, many times.

 

In his excellent book How We Decide, Jonah Lehrer advocates paying attention to your inner disagreements, becoming a student of your own errors, and avoiding the trap of certainty.  When you are certain that you're right, you stop considering the possibility that you might be wrong.

 

James Joyce wrote that "mistakes are the portals of discovery" and T.S. Eliot wrote that "we must not cease from exploration and the end of all our exploring will be to arrive where we began and to know the place for the first time."

 

Once upon a time, there was a young data quality consultant that realized he was an idiot - and a very good time it was.

Are You Afraid Of Your Data Quality Solution?

As a data quality consultant, when I begin an engagement with a new client, I ask many questions.  I seek an understanding of the current environment from both the business and technical perspectives.  Some of the common topics I cover are what data quality solutions have been attempted previously, how successful were they and are they still in use today.  To their credit, I find that many of my clients have successfully implemented data quality solutions that are still in use.

 

However, this revelation frequently leads to some form of the following dialogue:

OCDQ:  "Am I here to help with the enhancements for the next iteration of the project?"

Client:  "No, we don't want to enhance our existing solution, we want you to build us a brand new one."

OCDQ:  "I thought you had successfully implemented a data quality solution.  Is that not true?"

Client:  "We believe the current solution is working as intended.  It appears to handle many of our data quality issues."

OCDQ:  "How long have you been using the current solution?"

Client:  "Five years."

OCDQ:  "You haven't made any changes in five years?  Haven't there been requests for bug fixes and enhancements?"

Client:  "Yes, of course.  However, we didn't want to make any modifications because we were afraid we would break it."

OCDQ:  "Who created the current solution?  Didn't they provide documentation, training and knowledge transfer?"

Client:  "A previous consultant created it.  He provided some documentation and training, but only on how to run it."

 

A common data quality adage is:

"If you can't measure it, then you can't manage it." 

A far more important data quality adage is:

"If you don't know how to maintain it, then you shouldn't implement it."

 

There are many important considerations when planning a data quality initiative.  One of the most common mistakes is the unrealistic perspective that data quality problems can be permanently “fixed" by implementing a one-time "solution" that doesn't require ongoing improvements.  This flawed perspective leads many organizations to invest in powerful software and expert consultants, believing that:

"If they build it, data quality will come." 

However, data quality is not a field of dreams - and I know because I actually live in Iowa.

 

The reality is data quality initiatives can only be successful when they follow these very simple and time-tested instructions:

Measure, Improve, Repeat.


Enterprise Data World 2009

Formerly known as the DAMA International Symposium and Wilshire MetaData Conference, Enterprise Data World 2009 was held April 5-9 in Tampa, Florida at the Tampa Convention Center.

 

Enterprise Data World is the business world’s most comprehensive vendor-neutral educational event about data and information management.  This year’s program was bigger than ever before, with more sessions, more case studies, and more can’t-miss content.  With 200 hours of in-depth tutorials, hands-on workshops, practical sessions and insightful keynotes, the conference was a tremendous success.  Congratulations and thanks to Tony Shaw, Maya Stosskopf and the entire Wilshire staff.

 

I attended Enterprise Data World 2009 as a member of the Iowa Chapter of DAMA and as a Data Quality Journalist for the International Association for Information and Data Quality (IAIDQ).

I used Twitter to provide live reporting from the sessions that I was attending.

I wish that I could have attended every session, but here are some highlights from ten of my favorites:

 

8 Ways Data is Changing Everything

Keynote by Stephen Baker from BusinessWeek

His article Math Will Rock Your World inspired his excellent book The Numerati.  Additionally, check out his blog: Blogspotting.

Quotes from the keynote:

  • "Data is changing how we understand ourselves and how we understand our world"
  • "Predictive data mining is about the mathematical modeling of humanity"
  • "Anthropologists are looking at social networking (e.g. Twitter, Facebook) to understand the science of friendship"

 

Master Data Management: Proven Architectures, Products and Best Practices

Tutorial by David Loshin from Knowledge Integrity.

Included material from his excellent book Master Data Management.  Additionally, check out his blog: David Loshin.

Quotes from the tutorial:

  • "Master Data are the core business objects used in the different applications across the organization, along with their associated metadata, attributes, definitions, roles, connections and taxonomies"
  • "Master Data Management (MDM) provides a unified view of core data subject areas (e.g. Customers, Products)"
  • "With MDM, it is important not to over-invest and under-implement - invest in and implement only what you need"

 

Master Data Management: Ignore the Hype and Keep the Focus on Data

Case Study by Tony Fisher from DataFlux and Jeff Grayson from Equinox Fitness.

Quotes from the case study:

  • "The most important thing about Master Data Management (MDM) is improving business processes"
  • "80% of any enterprise implementation should be the testing phase"
  • "MDM Data Quality (DQ) Challenge: Any % wrong means you’re 100% certain you’re not always right"
  • "MDM DQ Solution: Re-design applications to ensure the ‘front-door’ protects data quality"
  • "Technology is critical, however thinking through the operational processes is more important"

 

A Case of Usage: Working with Use Cases on Data-Centric Projects

Case Study by Susan Burk from IBM.

Quotes from the case study:

  • "Use Case is a sequence of actions performed to yield a result of observable business value"
  • "The primary focus of data-centric projects is data structure, data delivery and data quality"
  • "Don’t like use cases? – ok, call them business acceptance criteria – because that’s what a use case is"

 

Crowdsourcing: People are Smart, When Computers are Not

Session by Sharon Chiarella from Amazon Web Services.

Quotes from the session:

  • "Crowdsourcing is outsourcing a task typically performed by employees to a general community of people"
  • "Crowdsourcing eliminates over-staffing, lowers costs and reduces work turnaround time"
  • "An excellent example of crowdsourcing is open source software development (e.g. Linux)"

 

Improving Information Quality using Lean Six Sigma Methodology

Session by Atul Borkar and Guillermo Rueda from Intel.

Quotes from the session:

  • "Information Quality requires a structured methodology in order to be successful"
  • Lean Six Sigma Framework: DMAIC – Define, Measure, Analyze, Improve, Control:
    • Define = Describe the challenge, goal, process and customer requirements
    • Measure = Gather data about the challenge and the process
    • Analyze = Use hypothesis and data to find root causes
    • Improve = Develop, implement and refine solutions
    • Control = Plan for stability and measurement

 

Universal Data Quality: The Key to Deriving Business Value from Corporate Data

Session by Stefanos Damianakis from Netrics.

Quotes from the session:

  • "The information stored in databases is NEVER perfect, consistent and complete – and it never can be!"
  • "Gartner reports that 25% of critical data within large businesses is somehow inaccurate or incomplete"
  • "Gartner reports that 50% of implementations fail due to lack of attention to data quality issues"
  • "A powerful approach to data matching is the mathematical modeling of human decision making"
  • "The greatest advantage of mathematical modeling is that there are no data matching rules to build and maintain"

 

Defining a Balanced Scorecard for Data Management

Seminar by C. Lwanga Yonke, a founding member of the International Association for Information and Data Quality (IAIDQ).

Quotes from the seminar:

  • "Entering the same data multiple times is like paying the same invoice multiple times"
  • "Good metrics help start conversations and turn strategy into action"
  • Good metrics have the following characteristics:
    • Business Relevance
    • Clarity of Definition
    • Trending Capability (i.e. metric can be tracked over time)
    • Easy to aggregate and roll-up to a summary
    • Easy to drill-down to the details that comprised the measurement

 

Closing Panel: Data Management’s Next Big Thing!

Quotes from Panelist Peter Aiken from Data Blueprint:

  • Capability Maturity Levels:
    1. Initial
    2. Repeatable
    3. Defined
    4. Managed
    5. Optimized
  • "Most companies are at a capability maturity level of (1) Initial or (2) Repeatable"
  • "Data should be treated as a durable asset"

Quotes from Panelist Noreen Kendle from Burton Group:

  • "A new age for data and data management is on horizon – a perfect storm is coming"
  • "The perfect storm is being caused by massive data growth and software as a service (i.e. cloud computing)"
  • "Always remember that you can make lemonade from lemons – the bad in life can be turned into something good"

Quotes from Panelist Karen Lopez from InfoAdvisors:

  • "If you keep using the same recipe, then you keep getting the same results"
  • "Our biggest problem is not technical in nature - we simply need to share our knowledge"
  • "Don’t be a dinosaur! Adopt a ‘go with what is’ philosophy and embrace the future!"

Quotes from Panelist Eric Miller from Zepheira:

  • "Applications should not be ON The Web, but OF The Web"
  • "New Acronym: LED – Linked Enterprise Data"
  • "Semantic Web is the HTML of DATA"

Quotes from Panelist Daniel Moody from University of Twente:

  • "Unified Modeling Language (UML) was the last big thing in software engineering"
  • "The next big thing will be ArchiMate, which is a unified language for enterprise architecture modeling"

 

Mark Your Calendar

Enterprise Data World 2010 will take place in San Francisco, California at the Hilton San Francisco on March 14-18, 2010.

There are no Magic Beans for Data Quality

The CIO put Jack in charge of an enterprise initiative with a sizable budget to spend on improving data quality.

Jack was sent to a leading industry conference to evaluate data quality vendors.  While his flight was delayed, Jack was passing the time in the airport bar when he was approached by Machiavelli, a salesperson from a data quality software company called Magic Beans.

Machiavelli told Jack that he didn't need to go to the conference to evaluate vendors.  Instead, Jack could simply trade his entire budget for a unlimited license of Magic Beans. 

Machiavelli assured Jack that Magic Beans had the following features:

  • Simple to install
  • Remarkably intuitive user interface
  • Processes a gazillion records per nanosecond
  • Clairvoyantly detects and corrects existing data quality problems
  • Prevents all future data quality problems from happening

Jack agreed to the trade and went back to the office with Magic Beans.

Eighteen months later, Jack and the CIO carpooled to Washington, D.C. to ask Congress for a sizable bailout.

What is the moral of this story? 

(Other than never trust a salesperson named Machiavelli.)

There are many data quality vendors to choose from and all of them offer viable solutions driven by impressive technology.

However, technology sometimes carries with it a dangerous conceit – that what works in the laboratory and the engineering department will work in the boardroom and the accounting department, that what is true for the mathematician and the computer scientist will be true for the business analyst and the data steward.

My point is neither to discourage the purchase of data quality software, nor to try to convince you which vendor I think provides the superior solution – especially since these types of opinions are usually biased by the practical limits of your personal experience and motivated by the kind folks who are currently paying your salary.

And I am certainly not a Luddite opposed to the use of technology.  I am first, foremost, and proudly a techno-geek of the highest order.  However, I have seen too many projects fail when a solution to data quality problems was attempted by “throwing technology at it.”  I have seen beautifully architected, wonderfully coded, elegantly implemented technical solutions result in complete and utter failure.  These projects failed neither because using technology was the wrong approach nor because the wrong data quality software was selected.

Data quality solutions require a holistic approach involving people, methodology, and technology.

People

Sometimes, people doubt that data quality problems could be prevalent in their systems.  This “data denial” is not necessarily a matter of blissful ignorance, but is often a natural self-defense mechanism from the data owners on the business side and/or the process owners on the technical side.  No one likes to feel blamed for causing or failing to fix the data quality problems.  This is one of the many human dynamics that is missing from the relative clean room of the laboratory where the technology was developed.  You must consider the human factor because it will be the people involved in the project, and not the technology itself, that will truly make the project successful.

 

Methodology

Data characteristics and their associated quality challenges are unique from company to company.  Data quality can be defined differently by different functional areas within the same company.  Business rules can change from project to project.  Decision makers on the same project can have widely varying perspectives.  All of this points to the need for having an effective methodology, which will help you maximize the time and effort as well as the subsequent return on whatever technology you invest in.

 

Technology

I have used software from most of the Gartner Data Quality Magic Quadrant and many of the so-called niche vendors.  So I speak from experience when I say that all data quality vendors have viable solutions driven by impressive technology.  However, don't let the salesperson “blind you with science” to have unrealistic expectations of the software.  I am not trying to accuse all salespeople of Machiavellian machinations (even though we have all encountered a few who would shamelessly sell their mother’s soul to meet their quota).     

 

Conclusion

Just like any complex problem, there is no fast and easy solution.  Although incredible advancements in technology continue, there are no Magic Beans for Data Quality.

And there never will be.

An organization's data quality initiative can only be successful when people take on the challenge united by collaboration, guided by an effective methodology, and of course, implemented with amazing technology.