Master Data Management in Practice

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Master Data Management in Practice: Achieving True Customer MDM is a great new book by Dalton Cervo and Mark Allen, which demystifies the theories and industry buzz surrounding Master Data Management (MDM), and provides a practical guide for successfully implementing a Customer MDM program.

The book discusses the three major types of MDM (Analytical, Operational, and Enterprise), explaining exactly how MDM is related to, and supported by, data governance, data stewardship, and data quality.  Dalton and Mark explain how MDM does much more than just bring data together—it provides a set of processes, services, and policies that bring people together in a cross-functional and collaborative approach to enterprise data management.

Dalton Cervo has over 20 years experience in software development, project management, and data management, including architectural design and implementation of analytical MDM, and management of a data quality program for an enterprise MDM implementation.  Dalton is a senior solutions consultant at DataFlux, helping organizations in the areas of data governance, data quality, data integration, and MDM.  Read Dalton’s blog, follow Dalton on Twitter, and connect with Dalton on LinkedIn.

Mark Allen has over 20 years of data management and project management experience including extensive planning and deployment experience with customer master data initiatives, data governance programs, and leading data quality management practices.  Mark is a senior consultant and enterprise data governance lead at WellPoint, Inc.  Prior to WellPoint, Mark was a senior program manager in customer operations groups at Sun Microsystems and Oracle, where Mark served as the lead data steward for the customer data domain throughout the planning and implementation of an enterprise customer data hub.

On this episode of OCDQ Radio, I am joined by the authors to discuss how to properly prepare for a new MDM program.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

The Art of Data Matching

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

On this episode of OCDQ Radio, I am joined by Henrik Liliendahl Sørensen for a discussion about the Art of Data Matching.

Henrik is a data quality and master data management (MDM) professional also doing data architecture.  Henrik has worked 30 years in the IT business within a large range of business areas, such as government, insurance, manufacturing, membership, healthcare, public transportation, and more.

Henrik’s current engagements include working as practice manager at Omikron Data Quality, a data quality tool maker with headquarters in Germany, and as data quality specialist at Stibo Systems, a master data management vendor with headquarters in Denmark.  Henrik is also a charter member of the IAIDQ, and the creator of the LinkedIn Group for Data Matching for people interested in data quality and thrilled by automated data matching, deduplication, and identity resolution.

Henrik is one of the most prolific and popular data quality bloggers, regularly sharing his excellent insights about data quality, data matching, MDM, data architecture, data governance, diversity in data quality, and many other data management topics.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Data Governance Star Wars: Balancing Bureaucracy and Agility

I was recently discussing data governance best practices with Rob Karel, the well respected analyst at Forrester Research, and our conversation migrated to one of data governance’s biggest challenges — how to balance bureaucracy and business agility.

So Rob and I thought it would be fun to tackle this dilemma in a Star Wars themed debate across our individual blog platforms with Rob taking the position for Bureaucracy as the Empire and me taking the opposing position for Agility as the Rebellion.

(Yes, the cliché is true, conversations between self-proclaimed data geeks tend to result in Star Wars or Star Trek parallels.)

Disclaimer: Remember that this is a true debate format where Rob and I are intentionally arguing polar opposite positions with full knowledge that the reality is data governance success requires effectively balancing bureaucracy and agility.

Please take the time to read both of our blog posts, then we encourage your comments — and your votes (see the poll below).

Data Governance Star Wars

If you are having trouble viewing this video, you can watch it on Vimeo by clicking on this link: Data Governance Star Wars

The Force is Too Strong with This One

“Don’t give in to Bureaucracy—that is the path to the Dark Side of Data Governance.”

Data governance requires the coordination of a complex combination of a myriad of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data quality remediation, data stewardship, business process optimization, technology enablement, and, perhaps most notably, policy enforcement.

When confronted by this phantom menace of complexity, many organizations believe that the only path to success must be command and control—institute a rigid bureaucracy to dictate policies, demand compliance, and dole out punishments.  This approach to data governance often makes policy compliance feel like imperial rule, and policy enforcement feel like martial law.

But beware.  Bureaucracy, command, control—the Dark Side of Data Governance are they.  Once you start down the dark path, forever will it dominate your destiny, consume your organization it will.

No Time to Discuss this as a Committee

“There is a great disturbance in the Data, as if millions of voices suddenly cried out for Governance but were suddenly silenced.  I fear something terrible has happened.  I fear another organization has started by creating a Data Governance Committee.”

Yes, it’s true—at some point, an official Data Governance Committee (or Council, or Board, or Galactic Senate) will be necessary.

However, one of the surest ways to guarantee the failure of a new data governance program is to start by creating a committee.  This is often done with the best of intentions, bringing together key stakeholders from all around the organization, representatives of each business unit and business function, as well as data and technology stakeholders.  But when you start by discussing data governance as a committee, you often never get data governance out of the committee (i.e., all talk, mostly arguing, no action).

Successful data governance programs often start with a small band of rebels (aka change agents) struggling to restore quality to some business-critical data, or struggling to resolve inefficiencies in a key business process.  Once news of their successful pilot project spreads, more change agents will rally to the cause—because that’s what data governance truly requires, not a committee, but a cause to believe in and fight for—especially after the Empire of Bureaucracy strikes back and tries to put down the rebellion.

Collaboration is the Data Governance Force

“Collaboration is what gives a data governance program its power.  Its energy binds us together.  Cooperative beings are we.  You must feel the Collaboration all around you, among the people, the data, the business process, the technology, everywhere.”

Many rightfully lament the misleading term “data governance” because it appears to put the emphasis on “governing data.”

Data governance actually governs the interactions among business processes, data, technology and, most important—people.  It is the organization’s people, empowered by high quality data and enabled by technology, who optimize business processes for superior corporate performance.  Data governance reveals how truly interconnected and interdependent the organization is, showing how everything that happens within the enterprise happens as a result of the interactions occurring among its people.

Data governance provides the framework for the communication and collaboration of business, data, and technical stakeholders, and establishes an enterprise-wide understanding of the roles and responsibilities involved, and the accountability required to support the organization’s business activities, and materialize the value of the enterprise’s data as positive business impacts.

Enforcing data governance policies with command and control is the quick and easy path—to failure.  Principles, not policies, are what truly give a data governance program its power.  Communication and collaboration are the two most powerful principles.

“May the Collaboration be with your Data Governance program.  Always.”

Always in Motion is the Future

“Be mindful of the future, but not at the expense of the moment.  Keep your concentration here and now, where it belongs.”

Perhaps the strongest case against bureaucracy in data governance is the business agility that is necessary for an organization to survive and thrive in today’s highly competitive and rapidly evolving marketplace.  The organization must follow what works for as long as it works, but without being afraid to adjust as necessary when circumstances inevitably change.

Change is the only galactic constant, which is why data governance policies can never be cast in stone (or frozen in carbonite).

Will a well-implemented data governance strategy continue to be successful?  Difficult to see.  Always in motion is the future.  And this is why, when it comes to deliberately designing a data governance program for agility: “Do or do not.  There is no try.”

Click here to read Rob “Darth” Karel’s blog post entry in this data governance debate

Please feel free to also post a comment below and explain your vote or simply share your opinions and experiences.

Listen to Data Governance Star Wars on OCDQ Radio — In Part 1, Rob Karel and I discuss our blog mock debate, which is followed by a brief Star Wars themed intermission, and then in Part 2, Gwen Thomas joins us to provide her excellent insights.

Data Quality Pro

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

On this episode, I am joined by special guest Dylan Jones, the community leader of Data Quality Pro, the largest membership resource dedicated entirely to the data quality profession.

Dylan is currently overseeing the re-build and re-launch of Data Quality Pro into a next generation membership platform, and during our podcast discussion, Dylan describes some of the great new features that will be coming soon to Data Quality Pro.

Links for Data Quality Pro and Dylan Jones:

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

The IT Prime Directive of Business First Contact

This blog post is sponsored by the Enterprise CIO Forum and HP.

Every enterprise requires, as Ralph Loura explains, “end to end business insight to generate competitive advantage, and it’s hard to gain insight if the business is arms length away from the data and the systems and the processes that support business insight.”

Loura explains that one of the historical challenges with technology has been that most IT systems have traditionally taken years to deploy and are supported on timelines and lifecycles that are inconsistent with the dynamic business needs of the organization, which has, in some cases, caused technology to become a business disabler instead of a business enabler.

The change-averse nature of most legacy applications is the antithesis of the agile nature of most modern applications.

“It wasn’t too long ago,” explains John Dodge, “when speed didn’t matter, or was considered an enemy of a carefully laid out IT strategy based largely on lowest cost.”  However, speed and agility are now “a competitive imperative.  You have to be fast in today’s marketplace and no department feels the heat more than IT, according to the Enterprise CIO Forum Council members.”

“If you think in terms of speed and the dynamic nature of business,” explains Joseph Spagnoletti, “clearly the organization couldn’t operate at that pace or make the necessary changes without IT woven very deeply into the work that the business does.”

Spagnoletti believes that cloud computing, mobility, and analytics are the three technology enablers for the timely delivery of the information that the organization requires to support its constantly evolving business needs.

“Embedding IT into an organization optimizes a business’s competitive edge,” explains Bill Laberis, “because it empowers the people right at the front lines of the enterprise to make better, faster and more informed decisions — right at the point of contact with customers, partners and clients.”

Historically, IT had a technology-first mindset.  However, the new IT prime directive must become business first contact, embedding advanced technology right at the point of contact with the organization’s business needs, enabling the enterprise to continue its mission to explore new business opportunities with the agility to boldly go where no competitor has gone before.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

A Sadie Hawkins Dance of Business Transformation

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The Partly Cloudy CIO

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

A Sadie Hawkins Dance of Business Transformation

This blog post is sponsored by the Enterprise CIO Forum and HP.

In the United States, a Sadie Hawkins Dance is a school-sponsored semi-formal dance, in which, contrary to the usual custom, female students invite male students.  In the world of information technology (IT), a Sadie Hawkins Dance is an enterprise-wide initiative, in which, contrary to the usual custom, a strategic business transformation is driven by IT.

Although IT-driven business transformation might seem like an oxymoron, the reality is a centralized IT department is one of the few organizational functions that regularly interacts with the entire enterprise.  Therefore, IT is strategically positioned to influence enterprise-wide business transformation—and CIOs might be able to take a business leadership role in those activities.

Wayne Shurts, the CIO of Supervalu, recently discussed how CIOs can make the transition to business leader by “approaching things from a business point of view, as opposed to a technology point of view.  IT must become intensely business driven.”

One thing Shurts emphasized is necessary for this shift in the perception of the CIO is that other C-level executives must realize “technology can be transformative for the organization, especially since it is transforming the consumer behavior of customers.”

 

Business Transformation through IT

David Steiner and Puneet Bhasin, the CEO and CIO of Waste Management, recently recorded a great two-part video interview called Business Transformation through IT, which you can check out using the following links: Part 1, Part 2, Transcript

“From day one,” explained Steiner, “I knew that the one way we could transform our company was through technology.”  Steiner then set out to find a CIO that could help him realize this vision of technology being transformative for the organization.

“If you’re going to be a true business partner,” explained Steiner, “which is what every CEO is looking for from their CIO, you have to go understand the business.”  Steiner explained that one of the first things that Bhasin did after he was hired as CIO was go out into the field and live the life of a customer service rep, a driver, a dispatcher, and a route manager—so that before Bhasin tried to do anything with technology, he first sought to understand the business so that he could become a true business partner.

“So the best advice I could give to any CIO would be,” concluded Steiner, “be a business partner, not a technologist.  Know the technology.  You’ve got to know how to apply the technology.  But be a business partner.”

“My advice to CEOs,” explained Bhasin, “would be look for a business person first and a technologist second.  And make sure that your CIO is a part of the decision-making strategic body within the organization.  If you are looking at IT purely as an area to reduce cost, that’s probably the wrong thing.  To me the value of IT is certainly in the area of efficiencies and cost reduction.  I think it has a huge role to play in that.  But I think it has an even greater role to play in product design, and growing customers, and expanding segments, and driving profitability.”

John Dodge recently blogged that business transformation is the CIO’s responsibility and opportunity.  Even though CIOs will eventually need their business partners to take the lead once they get out on the dance floor, CIOs may need to initiate things by inviting their business partners to A Sadie Hawkins Dance of Business Transformation.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The Partly Cloudy CIO

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

Got Data Quality?

I have written many blog posts about how it’s neither a realistic nor a required data management goal to achieve data perfection, i.e., 100% data quality or zero defects.

Of course, this admonition logically invites the questions:

If achieving 100% data quality isn’t the goal, then what is?

99%?

98%?

As I was pondering these questions while grocery shopping, I walked down the dairy aisle casually perusing the wide variety of milk options, when the thought occurred to me that data quality issues have a lot in common with the fat content of milk.

The classification of the percentage of fat (more specifically butterfat) in milk varies slightly by country.  In the United States, whole milk is approximately 3.25% fat, whereas reduced fat milk is 2% fat, low fat milk is 1% fat, and skim milk is 0.5% fat.

Reducing the total amount of fat (especially saturated and trans fat) is a common recommendation for a healthy diet.  Likewise, reducing the total amount of defects (i.e., data quality issues) is a common recommendation for a healthy data management strategy.  However, just like it would be unhealthy to remove all of the fat from your diet (because some fatty acids are essential nutrients that can’t be derived from other sources), it would be unhealthy to attempt to remove all of the defects from your data.

So maybe your organization is currently drinking whole data (i.e., 3.25% defects or 96.75% data quality) and needs to consider switching to reduced defect data (i.e., 2% defects or 98% data quality), low defect data (i.e., 1% defects or 99% data quality), or possibly even skim data (i.e., 0.5% defects or 99.5% data quality).

No matter what your perspective is regarding the appropriate data quality goal for your organization, at the very least, I think that we can all agree that all of our enterprise data management initiatives have to ask the question: “Got Quality?”

 

Related Posts

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Thaler’s Apples and Data Quality Oranges

Data Quality and The Middle Way

Missed It By That Much

The Data Quality Goldilocks Zone

You Can’t Always Get the Data You Want

Data Quality Practices—Activate!

This is a screen capture of the results of last month’s unscientific poll about proactive data quality versus reactive data quality alongside one of my favorite (this is the third post I’ve used it in) graphics of the Wonder Twins (Zan and Jayna) with Gleek.

Although reactive (15 combined votes) easily defeated proactive (6 combined votes) in the poll, proactive versus reactive is one debate that will likely never end.  However, the debate makes it seem as if we are forced to choose one approach over the other.

Generally speaking, most recommended data quality practices advocate implementing proactive defect prevention and avoiding reactive data cleansing.  But as Graham Rhind commented, data quality is neither exclusively proactive nor exclusively reactive.

“And if you need proof, start looking at the data,” Graham explained.  “For example, gender.  To produce quality data, a gender must be collected and assigned proactively, i.e., at the data collection stage.  Gender coding reactively on the basis of, for example, name, only works correctly and with certainty in a certain percentage of cases (that percentage always being less than 100).  Reactive data quality in that case can never be the best practice because it can never produce the best data quality, and, depending on what you do with your data, can be very damaging.”

“On the other hand,” Graham continued, “the real world to which the data is referring changes.  People move, change names, grow old, die.  Postal code systems and telephone number systems change.  Place names change, countries come and go.  In all of those cases, a reactive process is the one that will improve data quality.”

“Data quality is a continuous process,” Graham concluded.  From his perspective, a realistic data quality practice advocates being “proactive as much as possible, and reactive to keep up with a dynamic world.  Works for me, and has done well for decades.”

I agree with Graham because, just like any complex problem, data quality has no fast and easy solution.  In my experience, a hybrid discipline is always required, combining proactive and reactive approaches into one continuous data quality practice.

Or as Zan (representing Proactive) and Jayna (representing Reactive) would say: “Data Quality Practices—Activate!”

And as Gleek would remind us: “The best data quality practices remain continuously active.”

 

Related Posts

How active is your data quality practice?

The Data Quality Wager

The Dichotomy Paradox, Data Quality and Zero Defects

Retroactive Data Quality

A Tale of Two Q’s

What going to the dentist taught me about data quality

Groundhog Data Quality Day

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

What Data Quality Technology Wants

MacGyver: Data Governance and Duct Tape

To Our Data Perfectionists

Finding Data Quality

Data Quality and #FollowFriday the 13th

As Alice Hardy arrived at her desk at Crystal Lake Insurance, it seemed like a normal Friday morning.  Her thoughts about her weekend camping trip were interrupted by an eerie sound emanating from one of the adjacent cubicles:

Da da da, ta ta ta.  Da da da, ta ta ta.

“What’s that sound?” Alice wondered out loud.

“Sorry, am I typing too loud again?” responded Tommy Jarvis from another adjacent cubicle.  “Can you come take a look at something for me?”

“Sure, I’ll be right over,” Alice replied as she quickly circumnavigated their cluster of cubicles, puzzled and unsettled to find the other desks unoccupied with their computers turned off, wondering, to herself this time, where did that eerie sound come from?  Where are the other data counselors today?

“What’s up?” she casually asked upon entering Tommy’s cubicle, trying, as always, to conceal her discomfort about being alone in the office with the one colleague that always gave her the creeps.  Visiting his cubicle required a constant vigilance in order to avoid making prolonged eye contact, not only with Tommy Jarvis, but also with the horrifying hockey mask hanging above his computer screen like some possessed demon spawn from a horror movie.

“I’m analyzing the Date of Death in the life insurance database,” Tommy explained.  “And I’m receiving really strange results.  First of all, there are no NULLs, which indicates all of our policyholders are dead, right?  And if that wasn’t weird enough, there are only 12 unique values: January 13, 1978, February 13, 1981, March 13, 1987, April 13, 1990, May 13, 2011, June 13, 1997, July 13, 2001, August 13, 1971, September 13, 2002, October 13, 2006, November 13, 2009, and December 13, 1985.”

“That is strange,” said Alice.  “All of our policyholders can’t be dead.  And why is Date of Death always the 13th of the month?”

“It’s not just always the 13th of the month,” Tommy responded, almost cheerily.  “It’s always a Friday the 13th.”

“Well,” Alice slowly, and nervously, replied.  “I have a life insurance policy with Crystal Lake Insurance.  Pull up my policy.”

After a few, quick, loud pounding keystrokes, Tommy ominously read aloud the results now displaying on his computer screen, just below the hockey mask that Alice could swear was staring at her.  “Date of Death: May 13, 2011 . . . Wait, isn’t that today?”

Da da da, ta ta ta.  Da da da, ta ta ta.

“Did you hear that?” asked Alice.  “Hear what?” responded Tommy with a devilish grin.

“Never mind,” replied Alice quickly while trying to focus her attention on only the computer screen.  “Are you sure you pulled up the right policy?  I don’t recognize the name of the Primary Beneficiary . . . Who the hell is Jason Voorhees?”

“How the hell could you not know who Jason Voorhees is?” asked Tommy, with anger sharply crackling throughout his words.  “Jason Voorhees is now rightfully the sole beneficiary of every life insurance policy ever issued by Crystal Lake Insurance.”

Da da da, ta ta ta.  Da da da, ta ta ta.

“What?  That’s impossible!” Alice screamed.  “This has to be some kind of sick data quality joke.”

“It’s a data quality masterpiece!” Tommy retorted with rage.  “I just finished implementing my data machete, er I mean, my data matching solution.  From now on, Crystal Lake Insurance will never experience another data quality issue.”

“There’s just one last thing that I need to take care of.”

Da da da, ta ta ta.  Da da da, ta ta ta.

“And what’s that?” Alice asked, smiling nervously while quickly backing away into the hallway—and preparing to run for her life.

Da da da, ta ta ta.  Da da da, ta ta ta.

“Real-world alignment,” replied Tommy.  Rising to his feet, he put on the hockey mask, and pulled an actual machete out of the bottom drawer of his desk.  “Your Date of Death is entered as May 13, 2011.  Therefore, I must ensure real-world alignment.”

Da da da, ta ta ta.  Da da da, ta ta ta.  Da da da, ta ta ta.  Da da da, ta ta ta.  Data Quality.

The End.

(Note — You can also listen to the OCDQ Radio Theater production of this DQ-Tale in the Scary Calendar Effects episode.)

#FollowFriday Recommendations

#FollowFriday is when Twitter users recommend other users you should follow, so here are some great tweeps who provide tweets mostly about Data Quality, Data Governance, Master Data Management, Business Intelligence, and Big Data Analytics:

(Please Note: This is by no means a comprehensive list, is listed in no particular order whatsoever, and no offense is intended to any of my tweeps not listed below.  I hope that everyone has a great #FollowFriday and an even greater weekend.)

Are Applications the La Brea Tar Pits for Data?

This blog post is sponsored by the Enterprise CIO Forum and HP.

In a previous post, I explained application modernization must become the information technology (IT) prime directive in order for IT departments to satisfy the speed and agility business requirements of their organizations.  An excellent point raised in the comments of that post was that continued access to legacy data is often a business driver for not sunsetting legacy applications.

“I find many legacy applications are kept alive in read-only mode, i.e., purely for occasional query/reporting purposes,” explained Beth Breidenbach.  “Stated differently, the end users often just want to be able to look at the legacy data from time to time.”

Gordon Hamilton commented that data is often stuck in the “La Brea Tar Pits of legacy” applications.  Even when the data is migrated during the implementation of a new application (its new tar pit, so to speak), the legacy data, as Breidenbach said, is often still accessed via the legacy application, which could be dangerous, as Hamilton noted, because the legacy data is diverging from the version migrated to the new application (i.e., after migration, the legacy data could be updated, or possibly deleted).

The actual La Brea Tar Pits were often covered with water, causing animals that came to drink to fall in and get stuck in the tar, thus preserving their fossils for centuries—much to the delight of future paleontologists and natural history museum enthusiasts.

Although they are often cited as the bane of data management, most data silos are actually application silos because historically data and applications have been so tightly coupled.  Data is often covered with an application layer, causing users that enter, access, and use the data to get stuck with the functionality provided by its application, thus preserving their use of the application even after it has become outdated (i.e., legacy)—much to the dismay of IT departments and emerging technology enthusiasts.

When so tightly coupled with data, applications—not just legacy applications—truly can be the La Brea Tar Pits for data, since once data needed to support business activities gets stuck in an application, that application will stick around for a very long time.

If applications and data were not so tightly coupled, we could both modernize our applications and optimize our data usage in order to better satisfy the speed and agility business requirements of our organizations.  Therefore, not only should we sunset our legacy applications, we should also approach data management with the mindset of decoupling our data from its applications.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

A Sadie Hawkins Dance of Business Transformation

Why does the sun never set on legacy applications?

The Partly Cloudy CIO

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

DQ-BE: Invitation to Duplication

Data Quality By Example (DQ-BE) is an OCDQ regular segment that provides examples of data quality key concepts.

I recently received my invitation to the Data Governance and Information Quality Conference, which will be held June 27-30 in San Diego, California at the Catamaran Resort Hotel and Spa.  Well, as shown above, I actually received both of my invitations.

Although my postal address is complete, accurate, and exactly the same on both of the invitations, my name is slightly different (“James” vs. “Jim”), and my title (“Data Quality Journalist” vs. “Blogger-in-Chief”) and company (“IAIDQ” vs. “OCDQ Blog”) are both completely different.  I wonder how many of the data quality software vendors sponsoring this conference would consider my invitations to be duplicates.  (Maybe I’ll use the invitations to perform a vendor evaluation on the exhibit floor.)

So it would seem that even “The Premier Event in Data Governance and Data Quality” can experience data quality problems.

No worries, I doubt the invitation system will be one of the “Practical Approaches and Success Stories” presented—unless it’s used as a practical approach to a success story about demonstrating how embarrassing it might be to send duplicate invitations to a data quality journalist and blogger-in-chief.  (I wonder if this blog post will affect the approval of my Press Pass for the event.)

 

DGIQ Event Button Okay, on a far more serious note, you should really consider attending this event.  As the conference agenda shows, there will be great keynote presentations, case studies, tutorials, and other sessions conducted by experts in data governance and data quality, including (among many others) Larry English, Danette McGilvray, Mike Ferguson, David Loshin, and Thomas Redman.

 

Related Posts

DQ-BE: Dear Valued Customer

Customer Incognita

Identifying Duplicate Customers

Adventures in Data Profiling (Part 7) – Customer Name

The Quest for the Golden Copy (Part 3) – Defining “Customer”

‘Tis the Season for Data Quality

The Seven Year Glitch

DQ-IRL (Data Quality in Real Life)

Data Quality, 50023

Once Upon a Time in the Data

The Semantic Future of MDM

The Dichotomy Paradox, Data Quality and Zero Defects

As Joseph Mazur explains in Zeno’s Paradox, the ancient Greek philosopher Zeno constructed a series of logical paradoxes to prove that motion is impossible, which today remain on the cutting edge of our investigations into the fabric of space and time.

One of the paradoxes is known as the Dichotomy:

“A moving object will never reach any given point, because however near it may be, it must always first accomplish a halfway stage, and then the halfway stage of what is left and so on, and this series has no end.  Therefore, the object can never reach the end of any given distance.”

Of course, this paradox sounds silly.  After all, reaching a given point like the finish line in a race is reachable in real life since people win races all the time.  However, in theory, the mathematics is maddeningly sound, since it creates an infinite series of steps between the starting point and the finish line—and an infinite number of steps creates a journey that can never end.

Furthermore, this theoretical race cannot even begin, since in order to reach the first step, the recursive nature of this paradox proves that we would never reach the point of completing the first step.  Hence, the paradoxical conclusion is any travel over any finite distance can neither be completed nor begun, and so all motion must be an illusion.  Some of the greatest minds in history (from Galileo to Einstein to Stephen Hawking) have tackled the Dichotomy Paradox—but without being able to disprove it.

Data Quality and Zero Defects

The given point that many enterprise initiatives attempt to reach with data quality is 100% with a metric such as data accuracy.  Leaving aside (in this post) the fact that any data quality metric without a tangible business context provides no business value, 100% data quality (aka Zero Defects) is an unreachable destination—no matter how close you get or how long you try to reach it.

Zero Defects is a laudable goal—but its theory and practice comes from manufacturing quality.  However, I have always been of the opinion, unpopular among some of my peers, that manufacturing quality and data quality are very different disciplines, and although there is much to be learned from studying the theories of manufacturing quality, I believe that brute forcing those theories onto data quality is impractical and fundamentally flawed (and I’ve even said so in verse: To Our Data Perfectionists).

The given point that enterprise initiatives should actually be attempting to reach is data-driven solutions for business problems.

Advocates of Zero Defects argue that, in theory, defect-free data should be fit to serve as the basis for every possible business use, enabling a data-driven solution for any business problem.  However, in practice, business uses for data, as well as business itself, is always evolving.  Therefore, business problems are dynamic problems that do not have—nor do they require—perfect solutions.

Although the Dichotomy Paradox proves motion is theoretically impossible, our physical motion practically proves otherwise.  Has your data quality practice become motionless by trying to prove that Zero Defects is more than just theoretically possible?