Data Profiling Early and Often

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

On this episode of OCDQ Radio, I discuss data profiling with James Standen, the founder and CEO of nModal Solutions Inc., the makers of Datamartist, which is a fast, easy to use, visual data profiling and transformation tool.

Before founding nModal, James had over 15 years experience in a broad range of roles involving data, ranging from building business intelligence solutions, creating data warehouses and a data warehouse competency center, through to working on data migration and ERP projects in large organizations.  You can learn more about and connect with James Standen on LinkedIn.

James thinks that while there is obviously good data and bad data, that often bad data is just misunderstood and can be coaxed away from the dark side if you know how to approach it.  He does recommend wearing the proper safety equipment however, and having the right tools.  For more of his wit and wisdom, follow Datamartist on Twitter, and read the Datamartist Blog.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Data Governance and Information Quality 2011

Last week, I attended the Data Governance and Information Quality 2011 Conference, which was held June 27-30 in San Diego, California at the Catamaran Resort Hotel and Spa.

In this blog post, I summarize a few of the key points from some of the sessions I attended.  I used Twitter to help me collect my notes, and you can access the complete archive of my conference tweets on Twapper Keeper.

 

Assessing Data Quality Maturity

In his pre-conference tutorial, David Loshin, author of the book The Practitioner’s Guide to Data Quality Improvement, described five stages comprising a continuous cycle of data quality improvement:

  1. Identify and measure how poor data quality impedes business objectives
  2. Define business-related data quality rules and performance targets
  3. Design data quality improvement processes that remediate business process flaws
  4. Implement data quality improvement methods
  5. Monitor data quality against targets

 

Getting Started with Data Governance

Oliver Claude from Informatica provided some tips for making data governance a reality:

  • Data Governance requires acknowledging People, Process, and Technology are interlinked
  • You need to embed your data governance policies into your operational business processes
  • Data Governance must be Business-Centric, Technology-Enabled, and Business/IT Aligned

 

Data Profiling: An Information Quality Fundamental

Danette McGilvray, author of the book Executing Data Quality Projects, shared some of her data quality insights:

  • Although the right technology is essential, data quality is more than just technology
  • Believing tools cause good data quality is like believing X-Ray machines cause good health
  • Data Profiling is like CSI — Investigating the Poor Data Quality Crime Scene

 

Building Data Governance and Instilling Data Quality

In the opening keynote address, Dan Hartley of ConAgra Foods shared his data governance and data quality experiences:

  • It is important to realize that data governance is a journey, not a destination
  • One of the commonly overlooked costs of data governance is the cost of inaction
  • Data governance must follow a business-aligned and business-value-driven approach
  • Data governance is as much about change management as it is anything else
  • Data governance controls must be carefully balanced so they don’t disrupt business processes
  • Common Data Governance Challenge: Balancing Data Quality and Speed (i.e., Business Agility)
  • Common Data Governance Challenge: Picking up Fumbles — Balls dropped between vertical organizational silos
  • Bad business processes cause poor data quality
  • Better Data Quality = A Better Bottom Line
  • One of the most important aspects of Data Governance and Data Quality — Wave the Flag of Success

 

Practical Data Governance

Winston Chen from Kalido discussed some aspects of delivering tangible value with data governance:

  • Data governance is the business process of defining, implementing, and enforcing data policies
  • Every business process can be improved by feeding it better data
  • Data Governance is the Horse, not the Cart, i.e., Data Governance drives MDM and Data Quality
  • Data Governance needs to balance Data Silos (Local Authority) and Data Cathedrals (Central Control)

 

The Future of Data Governance and Data Quality

The closing keynote panel, moderated by Danette McGilvray, included the following insights:

  • David Plotkin: “It is not about Data, Process, or Technology — It is about People”
  • John Talburt: “For every byte of Data, we need 1,000 bytes of Metadata to go along with it”
  • C. Lwanga Yonke: “One of the most essential skills is the ability to lead change”
  • John Talburt: “We need to be focused on business-value-based data governance and data quality”
  • C. Lwanga Yonke: “We must be multilingual: Speak Data/Information, Business, and Technology”

 

Organizing for Data Quality

In his post-conference tutorial, Tom Redman, author of the book Data Driven, described ten habits of those with the best data:

  1. Focus on the most important needs of the most important customers
  2. Apply relentless attention to process
  3. Manage all critical sources of data, including external suppliers
  4. Measure data quality at the source and in business terms
  5. Employ controls at all levels to halt simple errors and establish a basis for moving forward
  6. Develop a knack for continuous improvement
  7. Set and achieve aggressive targets for improvement
  8. Formalize management accountabilities for data
  9. Lead the effort using a broad, senior group
  10. Recognize that the hard data quality issues are soft and actively manage the needed cultural changes

 

Tweeps Out at the Ball Game

As I mentioned earlier, I used Twitter to help me collect my notes, and you can access the complete archive of my conference tweets on Twapper Keeper.

But I wasn’t the only data governance and data quality tweep at the conference.  Steve Sarsfield, April Reeve, and Joe Dos Santos were also attending and tweeting.

However, on Tuesday night, we decided to take a timeout from tweeting, and instead became Tweeps out at the Ball Game by attending the San Diego Padres and Kansas Royals baseball game at PETCO Park.

We sang Take Me Out to the Ball Game, bought some peanuts and Cracker Jack, and root, root, rooted for the home team, which apparently worked since Padres closer Heath Bell got one, two, three strikes, you’re out on Royals third baseman Wilson Betemit, and the San Diego Padres won the game by a final score of 4-2.

So just like at the Data Governance and Information Quality 2011 Conference, a good time was had by all.  See you next year!

 

Related Posts

Stuck in the Middle with Data Governance

DQ-BE: Invitation to Duplication

TDWI World Conference Orlando 2010

Light Bulb Moments at DataFlux IDEAS 2010

Enterprise Data World 2010

Enterprise Data World 2009

TDWI World Conference Chicago 2009

DataFlux IDEAS 2009

Data Governance Star Wars

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

WindowsLiveWriter-DataGovernanceStarWars_728F-

Shown above is the poll results from the recent Star Wars themed blog debate about one of data governance’s biggest challenges, how to balance bureaucracy and business agility.  Rob Karel took the position for Bureaucracy as Darth Karel of the Empire, and I took the position for Agility as OCDQ-Wan Harris of the Rebellion.

However, this was a true debate format where Rob and I intentionally argued polar opposite positions with full knowledge that the reality is data governance success requires effectively balancing bureaucracy and business agility.

Just in case you missed the blog debate, here are the post links:

On this special, extended, and Star Wars themed episode of OCDQ Radio, I am joined by Rob Karel and Gwen Thomas to discuss this common challenge of effectively balancing bureaucracy and business agility on data governance programs.

Rob Karel is a Principal Analyst at Forrester Research, where he serves Business Process and Applications Professionals.  Rob is a leading expert in how companies manage data and integrate information across the enterprise.  His current research focus includes process data management, master data management, data quality management, metadata management, data governance, and data integration technologies.  Rob has more than 19 years of data management experience, working in both business and IT roles to develop solutions that provide better quality, confidence in, and usability of critical enterprise data.

Gwen Thomas is the Founder and President of The Data Governance Institute, a vendor-neutral, mission-based organization with three arms: publishing free frameworks and guidance, supporting communities of practitioners, and offering training and consulting.  Gwen also writes the popular blog Data Governance Matters, frequently contributes to IT and business publications, and is the author of the book Alpha Males and Data Disasters: The Case for Data Governance.

This extended episode of OCDQ Radio is 49 minutes long, and is divided into two parts, which are separated by a brief Star Wars themed intermission.  In Part 1, Rob and I discuss our blog debate.  In Part 2, Gwen joins us to provide her excellent insights.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Stuck in the Middle with Data Governance

Perhaps the most common debate about data governance is whether it should be started from the top down or the bottom up.

Data governance requires the coordination of a complex combination of a myriad of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data quality remediation, data stewardship, business process optimization, technology, policy enforcement—and obviously many other factors as well.

This common debate is understandable since some of these data governance success factors are mostly top-down (e.g., funding), and some of these data governance success factors are mostly bottom-up (e.g., data quality remediation and data stewardship).

However, the complexity that stymies many organizations is most data governance success factors are somewhere in the middle.

 

Stuck in the Middle with Data Governance

At certain times during the evolution of a data governance program, top-down aspects will be emphasized, and at other times, bottom-up aspects will be emphasized.  So whether you start from the top down or the bottom up, eventually you are going to need to blend together top-down and bottom-up aspects in order to sustain an ongoing and pervasive data governance program.

To paraphrase The Beatles, when you get to the bottom, you go back to the top, where you stop and turn, and you go for a ride until you get to the bottom—and then you do it again.  (But hopefully your program doesn’t get code-named: “Helter Skelter”)

But after some initial progress has been made, to paraphrase Stealers Wheel, people within the organization may start to feel like we have top-down to the left of us, bottom-up to the right to us, and here we are—stuck in the middle with data governance.

In other words, although data governance is never a direct current only flowing in one top-down or bottom-up direction, but instead continually flows in an alternating current between top-down and bottom-up, when this dynamic is not communicated to everyone throughout the organization, progress is disrupted by people waiting around for someone else to complete the circuit.

But when, paraphrasing Pearl Jam, data governance is taken up by the middle—then there ain’t gonna be any middle any more.

In other words, when data governance pervades every level of the organization, everyone stops thinking in terms of top-down and bottom-up, and acts like an enterprise in the midst of sustaining the momentum of a successful data governance program.

 

Data Governance Conference

DGIQ Event Button

Next week, I will be attending the Data Governance and Information Quality Conference, which will be held June 27-30 in San Diego, California at the Catamaran Resort Hotel and Spa.

If you will also be attending, and you want to schedule a meeting with me: Contact me via email

If you will not be attending, you can follow the conference tweets using the hashtag: #DGIQ2011

 

Related Posts

Data Governance Star Wars: Balancing Bureaucracy And Agility

Council Data Governance

DQ-View: Roman Ruts on the Road to Data Governance

The Data Governance Oratorio

Zig-Zag-Diagonal Data Governance

Data Governance and the Buttered Cat Paradox

Beware the Data Governance Ides of March

A Tale of Two G’s

The People Platform

Rise of the Datechnibus

The Collaborative Culture of Data Governance

Connect Four and Data Governance

The Role Of Data Quality Monitoring In Data Governance

Quality and Governance are Beyond the Data

Data Transcendentalism

Podcast: Data Governance is Mission Possible

Video: Declaration of Data Governance

Don’t Do Less Bad; Do Better Good

Jack Bauer and Enforcing Data Governance Policies

The Prince of Data Governance

MacGyver: Data Governance and Duct Tape

The Diffusion of Data Governance

Master Data Management in Practice

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Master Data Management in Practice: Achieving True Customer MDM is a great new book by Dalton Cervo and Mark Allen, which demystifies the theories and industry buzz surrounding Master Data Management (MDM), and provides a practical guide for successfully implementing a Customer MDM program.

The book discusses the three major types of MDM (Analytical, Operational, and Enterprise), explaining exactly how MDM is related to, and supported by, data governance, data stewardship, and data quality.  Dalton and Mark explain how MDM does much more than just bring data together—it provides a set of processes, services, and policies that bring people together in a cross-functional and collaborative approach to enterprise data management.

Dalton Cervo has over 20 years experience in software development, project management, and data management, including architectural design and implementation of analytical MDM, and management of a data quality program for an enterprise MDM implementation.  Dalton is a senior solutions consultant at DataFlux, helping organizations in the areas of data governance, data quality, data integration, and MDM.  Read Dalton’s blog, follow Dalton on Twitter, and connect with Dalton on LinkedIn.

Mark Allen has over 20 years of data management and project management experience including extensive planning and deployment experience with customer master data initiatives, data governance programs, and leading data quality management practices.  Mark is a senior consultant and enterprise data governance lead at WellPoint, Inc.  Prior to WellPoint, Mark was a senior program manager in customer operations groups at Sun Microsystems and Oracle, where Mark served as the lead data steward for the customer data domain throughout the planning and implementation of an enterprise customer data hub.

On this episode of OCDQ Radio, I am joined by the authors to discuss how to properly prepare for a new MDM program.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

The Art of Data Matching

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

On this episode of OCDQ Radio, I am joined by Henrik Liliendahl Sørensen for a discussion about the Art of Data Matching.

Henrik is a data quality and master data management (MDM) professional also doing data architecture.  Henrik has worked 30 years in the IT business within a large range of business areas, such as government, insurance, manufacturing, membership, healthcare, public transportation, and more.

Henrik’s current engagements include working as practice manager at Omikron Data Quality, a data quality tool maker with headquarters in Germany, and as data quality specialist at Stibo Systems, a master data management vendor with headquarters in Denmark.  Henrik is also a charter member of the IAIDQ, and the creator of the LinkedIn Group for Data Matching for people interested in data quality and thrilled by automated data matching, deduplication, and identity resolution.

Henrik is one of the most prolific and popular data quality bloggers, regularly sharing his excellent insights about data quality, data matching, MDM, data architecture, data governance, diversity in data quality, and many other data management topics.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Data Governance Star Wars: Balancing Bureaucracy and Agility

I was recently discussing data governance best practices with Rob Karel, the well respected analyst at Forrester Research, and our conversation migrated to one of data governance’s biggest challenges — how to balance bureaucracy and business agility.

So Rob and I thought it would be fun to tackle this dilemma in a Star Wars themed debate across our individual blog platforms with Rob taking the position for Bureaucracy as the Empire and me taking the opposing position for Agility as the Rebellion.

(Yes, the cliché is true, conversations between self-proclaimed data geeks tend to result in Star Wars or Star Trek parallels.)

Disclaimer: Remember that this is a true debate format where Rob and I are intentionally arguing polar opposite positions with full knowledge that the reality is data governance success requires effectively balancing bureaucracy and agility.

Please take the time to read both of our blog posts, then we encourage your comments — and your votes (see the poll below).

Data Governance Star Wars

If you are having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Data Governance Star Wars

The Force is Too Strong with This One

“Don’t give in to Bureaucracy—that is the path to the Dark Side of Data Governance.”

Data governance requires the coordination of a complex combination of a myriad of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data quality remediation, data stewardship, business process optimization, technology enablement, and, perhaps most notably, policy enforcement.

When confronted by this phantom menace of complexity, many organizations believe that the only path to success must be command and control—institute a rigid bureaucracy to dictate policies, demand compliance, and dole out punishments.  This approach to data governance often makes policy compliance feel like imperial rule, and policy enforcement feel like martial law.

But beware.  Bureaucracy, command, control—the Dark Side of Data Governance are they.  Once you start down the dark path, forever will it dominate your destiny, consume your organization it will.

No Time to Discuss this as a Committee

“There is a great disturbance in the Data, as if millions of voices suddenly cried out for Governance but were suddenly silenced.  I fear something terrible has happened.  I fear another organization has started by creating a Data Governance Committee.”

Yes, it’s true—at some point, an official Data Governance Committee (or Council, or Board, or Galactic Senate) will be necessary.

However, one of the surest ways to guarantee the failure of a new data governance program is to start by creating a committee.  This is often done with the best of intentions, bringing together key stakeholders from all around the organization, representatives of each business unit and business function, as well as data and technology stakeholders.  But when you start by discussing data governance as a committee, you often never get data governance out of the committee (i.e., all talk, mostly arguing, no action).

Successful data governance programs often start with a small band of rebels (aka change agents) struggling to restore quality to some business-critical data, or struggling to resolve inefficiencies in a key business process.  Once news of their successful pilot project spreads, more change agents will rally to the cause—because that’s what data governance truly requires, not a committee, but a cause to believe in and fight for—especially after the Empire of Bureaucracy strikes back and tries to put down the rebellion.

Collaboration is the Data Governance Force

“Collaboration is what gives a data governance program its power.  Its energy binds us together.  Cooperative beings are we.  You must feel the Collaboration all around you, among the people, the data, the business process, the technology, everywhere.”

Many rightfully lament the misleading term “data governance” because it appears to put the emphasis on “governing data.”

Data governance actually governs the interactions among business processes, data, technology and, most important—people.  It is the organization’s people, empowered by high quality data and enabled by technology, who optimize business processes for superior corporate performance.  Data governance reveals how truly interconnected and interdependent the organization is, showing how everything that happens within the enterprise happens as a result of the interactions occurring among its people.

Data governance provides the framework for the communication and collaboration of business, data, and technical stakeholders, and establishes an enterprise-wide understanding of the roles and responsibilities involved, and the accountability required to support the organization’s business activities, and materialize the value of the enterprise’s data as positive business impacts.

Enforcing data governance policies with command and control is the quick and easy path—to failure.  Principles, not policies, are what truly give a data governance program its power.  Communication and collaboration are the two most powerful principles.

“May the Collaboration be with your Data Governance program.  Always.”

Always in Motion is the Future

“Be mindful of the future, but not at the expense of the moment.  Keep your concentration here and now, where it belongs.”

Perhaps the strongest case against bureaucracy in data governance is the business agility that is necessary for an organization to survive and thrive in today’s highly competitive and rapidly evolving marketplace.  The organization must follow what works for as long as it works, but without being afraid to adjust as necessary when circumstances inevitably change.

Change is the only galactic constant, which is why data governance policies can never be cast in stone (or frozen in carbonite).

Will a well-implemented data governance strategy continue to be successful?  Difficult to see.  Always in motion is the future.  And this is why, when it comes to deliberately designing a data governance program for agility: “Do or do not.  There is no try.”

Click here to read Rob “Darth” Karel’s blog post entry in this data governance debate

Please feel free to also post a comment below and explain your vote or simply share your opinions and experiences.

Listen to Data Governance Star Wars on OCDQ Radio — In Part 1, Rob Karel and I discuss our blog mock debate, which is followed by a brief Star Wars themed intermission, and then in Part 2, Gwen Thomas joins us to provide her excellent insights.

Data Quality Pro

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

On this episode, I am joined by special guest Dylan Jones, the community leader of Data Quality Pro, the largest membership resource dedicated entirely to the data quality profession.

Dylan is currently overseeing the re-build and re-launch of Data Quality Pro into a next generation membership platform, and during our podcast discussion, Dylan describes some of the great new features that will be coming soon to Data Quality Pro.

Links for Data Quality Pro and Dylan Jones:

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Got Data Quality?

I have written many blog posts about how it’s neither a realistic nor a required data management goal to achieve data perfection, i.e., 100% data quality or zero defects.

Of course, this admonition logically invites the questions:

If achieving 100% data quality isn’t the goal, then what is?

99%?

98%?

As I was pondering these questions while grocery shopping, I walked down the dairy aisle casually perusing the wide variety of milk options, when the thought occurred to me that data quality issues have a lot in common with the fat content of milk.

The classification of the percentage of fat (more specifically butterfat) in milk varies slightly by country.  In the United States, whole milk is approximately 3.25% fat, whereas reduced fat milk is 2% fat, low fat milk is 1% fat, and skim milk is 0.5% fat.

Reducing the total amount of fat (especially saturated and trans fat) is a common recommendation for a healthy diet.  Likewise, reducing the total amount of defects (i.e., data quality issues) is a common recommendation for a healthy data management strategy.  However, just like it would be unhealthy to remove all of the fat from your diet (because some fatty acids are essential nutrients that can’t be derived from other sources), it would be unhealthy to attempt to remove all of the defects from your data.

So maybe your organization is currently drinking whole data (i.e., 3.25% defects or 96.75% data quality) and needs to consider switching to reduced defect data (i.e., 2% defects or 98% data quality), low defect data (i.e., 1% defects or 99% data quality), or possibly even skim data (i.e., 0.5% defects or 99.5% data quality).

No matter what your perspective is regarding the appropriate data quality goal for your organization, at the very least, I think that we can all agree that all of our enterprise data management initiatives have to ask the question: “Got Quality?”

 

Related Posts

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Thaler’s Apples and Data Quality Oranges

Data Quality and The Middle Way

Missed It By That Much

The Data Quality Goldilocks Zone

You Can’t Always Get the Data You Want

Data Quality Practices—Activate!

This is a screen capture of the results of last month’s unscientific poll about proactive data quality versus reactive data quality alongside one of my favorite (this is the third post I’ve used it in) graphics of the Wonder Twins (Zan and Jayna) with Gleek.

Although reactive (15 combined votes) easily defeated proactive (6 combined votes) in the poll, proactive versus reactive is one debate that will likely never end.  However, the debate makes it seem as if we are forced to choose one approach over the other.

Generally speaking, most recommended data quality practices advocate implementing proactive defect prevention and avoiding reactive data cleansing.  But as Graham Rhind commented, data quality is neither exclusively proactive nor exclusively reactive.

“And if you need proof, start looking at the data,” Graham explained.  “For example, gender.  To produce quality data, a gender must be collected and assigned proactively, i.e., at the data collection stage.  Gender coding reactively on the basis of, for example, name, only works correctly and with certainty in a certain percentage of cases (that percentage always being less than 100).  Reactive data quality in that case can never be the best practice because it can never produce the best data quality, and, depending on what you do with your data, can be very damaging.”

“On the other hand,” Graham continued, “the real world to which the data is referring changes.  People move, change names, grow old, die.  Postal code systems and telephone number systems change.  Place names change, countries come and go.  In all of those cases, a reactive process is the one that will improve data quality.”

“Data quality is a continuous process,” Graham concluded.  From his perspective, a realistic data quality practice advocates being “proactive as much as possible, and reactive to keep up with a dynamic world.  Works for me, and has done well for decades.”

I agree with Graham because, just like any complex problem, data quality has no fast and easy solution.  In my experience, a hybrid discipline is always required, combining proactive and reactive approaches into one continuous data quality practice.

Or as Zan (representing Proactive) and Jayna (representing Reactive) would say: “Data Quality Practices—Activate!”

And as Gleek would remind us: “The best data quality practices remain continuously active.”

 

Related Posts

How active is your data quality practice?

The Data Quality Wager

The Dichotomy Paradox, Data Quality and Zero Defects

Retroactive Data Quality

A Tale of Two Q’s

What going to the dentist taught me about data quality

Groundhog Data Quality Day

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

What Data Quality Technology Wants

MacGyver: Data Governance and Duct Tape

To Our Data Perfectionists

Finding Data Quality

Data Quality and #FollowFriday the 13th

As Alice Hardy arrived at her desk at Crystal Lake Insurance, it seemed like a normal Friday morning.  Her thoughts about her weekend camping trip were interrupted by an eerie sound emanating from one of the adjacent cubicles:

Da da da, ta ta ta.  Da da da, ta ta ta.

“What’s that sound?” Alice wondered out loud.

“Sorry, am I typing too loud again?” responded Tommy Jarvis from another adjacent cubicle.  “Can you come take a look at something for me?”

“Sure, I’ll be right over,” Alice replied as she quickly circumnavigated their cluster of cubicles, puzzled and unsettled to find the other desks unoccupied with their computers turned off, wondering, to herself this time, where did that eerie sound come from?  Where are the other data counselors today?

“What’s up?” she casually asked upon entering Tommy’s cubicle, trying, as always, to conceal her discomfort about being alone in the office with the one colleague that always gave her the creeps.  Visiting his cubicle required a constant vigilance in order to avoid making prolonged eye contact, not only with Tommy Jarvis, but also with the horrifying hockey mask hanging above his computer screen like some possessed demon spawn from a horror movie.

“I’m analyzing the Date of Death in the life insurance database,” Tommy explained.  “And I’m receiving really strange results.  First of all, there are no NULLs, which indicates all of our policyholders are dead, right?  And if that wasn’t weird enough, there are only 12 unique values: January 13, 1978, February 13, 1981, March 13, 1987, April 13, 1990, May 13, 2011, June 13, 1997, July 13, 2001, August 13, 1971, September 13, 2002, October 13, 2006, November 13, 2009, and December 13, 1985.”

“That is strange,” said Alice.  “All of our policyholders can’t be dead.  And why is Date of Death always the 13th of the month?”

“It’s not just always the 13th of the month,” Tommy responded, almost cheerily.  “It’s always a Friday the 13th.”

“Well,” Alice slowly, and nervously, replied.  “I have a life insurance policy with Crystal Lake Insurance.  Pull up my policy.”

After a few, quick, loud pounding keystrokes, Tommy ominously read aloud the results now displaying on his computer screen, just below the hockey mask that Alice could swear was staring at her.  “Date of Death: May 13, 2011 . . . Wait, isn’t that today?”

Da da da, ta ta ta.  Da da da, ta ta ta.

“Did you hear that?” asked Alice.  “Hear what?” responded Tommy with a devilish grin.

“Never mind,” replied Alice quickly while trying to focus her attention on only the computer screen.  “Are you sure you pulled up the right policy?  I don’t recognize the name of the Primary Beneficiary . . . Who the hell is Jason Voorhees?”

“How the hell could you not know who Jason Voorhees is?” asked Tommy, with anger sharply crackling throughout his words.  “Jason Voorhees is now rightfully the sole beneficiary of every life insurance policy ever issued by Crystal Lake Insurance.”

Da da da, ta ta ta.  Da da da, ta ta ta.

“What?  That’s impossible!” Alice screamed.  “This has to be some kind of sick data quality joke.”

“It’s a data quality masterpiece!” Tommy retorted with rage.  “I just finished implementing my data machete, er I mean, my data matching solution.  From now on, Crystal Lake Insurance will never experience another data quality issue.”

“There’s just one last thing that I need to take care of.”

Da da da, ta ta ta.  Da da da, ta ta ta.

“And what’s that?” Alice asked, smiling nervously while quickly backing away into the hallway—and preparing to run for her life.

Da da da, ta ta ta.  Da da da, ta ta ta.

“Real-world alignment,” replied Tommy.  Rising to his feet, he put on the hockey mask, and pulled an actual machete out of the bottom drawer of his desk.  “Your Date of Death is entered as May 13, 2011.  Therefore, I must ensure real-world alignment.”

Da da da, ta ta ta.  Da da da, ta ta ta.  Da da da, ta ta ta.  Da da da, ta ta ta.  Data Quality.

The End.

(Note — You can also listen to the OCDQ Radio Theater production of this DQ-Tale in the Scary Calendar Effects episode.)

#FollowFriday Recommendations

#FollowFriday is when Twitter users recommend other users you should follow, so here are some great tweeps who provide tweets mostly about Data Quality, Data Governance, Master Data Management, Business Intelligence, and Big Data Analytics:

(Please Note: This is by no means a comprehensive list, is listed in no particular order whatsoever, and no offense is intended to any of my tweeps not listed below.  I hope that everyone has a great #FollowFriday and an even greater weekend.)

DQ-BE: Invitation to Duplication

Data Quality By Example (DQ-BE) is an OCDQ regular segment that provides examples of data quality key concepts.

I recently received my invitation to the Data Governance and Information Quality Conference, which will be held June 27-30 in San Diego, California at the Catamaran Resort Hotel and Spa.  Well, as shown above, I actually received both of my invitations.

Although my postal address is complete, accurate, and exactly the same on both of the invitations, my name is slightly different (“James” vs. “Jim”), and my title (“Data Quality Journalist” vs. “Blogger-in-Chief”) and company (“IAIDQ” vs. “OCDQ Blog”) are both completely different.  I wonder how many of the data quality software vendors sponsoring this conference would consider my invitations to be duplicates.  (Maybe I’ll use the invitations to perform a vendor evaluation on the exhibit floor.)

So it would seem that even “The Premier Event in Data Governance and Data Quality” can experience data quality problems.

No worries, I doubt the invitation system will be one of the “Practical Approaches and Success Stories” presented—unless it’s used as a practical approach to a success story about demonstrating how embarrassing it might be to send duplicate invitations to a data quality journalist and blogger-in-chief.  (I wonder if this blog post will affect the approval of my Press Pass for the event.)

 

DGIQ Event Button Okay, on a far more serious note, you should really consider attending this event.  As the conference agenda shows, there will be great keynote presentations, case studies, tutorials, and other sessions conducted by experts in data governance and data quality, including (among many others) Larry English, Danette McGilvray, Mike Ferguson, David Loshin, and Thomas Redman.

 

Related Posts

DQ-BE: Dear Valued Customer

Customer Incognita

Identifying Duplicate Customers

Adventures in Data Profiling (Part 7) – Customer Name

The Quest for the Golden Copy (Part 3) – Defining “Customer”

‘Tis the Season for Data Quality

The Seven Year Glitch

DQ-IRL (Data Quality in Real Life)

Data Quality, 50023

Once Upon a Time in the Data

The Semantic Future of MDM