Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments

Entries in Master Data Management (42)

Saturday
Jan262013

MDM, Assets, Locations, and the TARDIS

Henrik Liliendahl Sørensen, as usual, is facilitating excellent discussion around master data management (MDM) concepts via his blog.  Two of his recent posts, Multi-Entity MDM vs. Multi-Domain MDM and The Real Estate Domain, have both received great commentary.  So, in case you missed them, be sure to read those posts, and join in their comment discussions/debates.

A few of the concepts discussed and debated reminded me of the OCDQ Radio episode Demystifying Master Data Management, during which guest John Owens explained the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), as well as, and perhaps the most important concept of all, the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Henrik’s second post touched on Location and Asset, which come up far less often in MDM discussions than Party and Product do, and arguably with understandably good reason.  This reminded me of the science fiction metaphor I used during my podcast with John, a metaphor I made in an attempt to help explain the difference and relationship between an Asset and a Location.

Location is often over-identified with postal address, which is actually just one means of referring to a location.  A location can also be referred to by its geographic coordinates, either absolute (e.g., latitude and longitude) or relative (e.g., 7 miles northeast of the intersection of Route 66 and Route 54).

Asset refers to a resource owned or controlled by an enterprise and capable of producing business value.  Assets are often over-identified with their location, especially real estate assets such as a manufacturing plant or an office building, since they are essentially immovable assets always at a particular location.

However, many assets are movable, such as the equipment used to manufacture products, or the technology used to support employee activities.  These assets are not always at a particular location (e.g., laptops and smartphones used by employees) and can also be dependent on other, non-co-located, sub-assets (e.g., replacement parts needed to repair broken equipment).

In Doctor Who, a brilliant British science fiction television program celebrating its 50th anniversary this year, the TARDIS, which stands for Time and Relative Dimension in Space, is the time machine and spaceship the Doctor and his companions travel in.

The TARDIS is arguably the Doctor’s most important asset, but its location changes frequently, both during and across episodes.

So, in MDM, we could say that Location is a time and relative dimension in space where we would currently find an Asset.

 

Related Posts

OCDQ Radio - Demystifying Master Data Management

OCDQ Radio - Master Data Management in Practice

OCDQ Radio - The Art of Data Matching

Plato’s Data

Once Upon a Time in the Data

The Data Cold War

DQ-BE: Single Version of the Time

The Data Outhouse

Fantasy League Data Quality

OCDQ Radio - The Blue Box of Information Quality

Choosing Your First Master Data Domain

Lycanthropy, Silver Bullets, and Master Data Management

Voyage of the Golden Records

The Quest for the Golden Copy

How Social can MDM get?

Will Social MDM be the New Spam?

More Thoughts about Social MDM

Is Social MDM going the Wrong Way?

The Semantic Future of MDM

Small Data and VRM

Thursday
Jan032013

Best OCDQ Blog Posts of 2012

Welcome to my roundup of the best blog posts published on the Obsessive-Compulsive Data Quality (OCDQ) blog during 2012.

My selections were based on a pseudo-scientific, quasi-statistical combination of page views, comments, and re-tweets, as well as choosing a few of my personal favorites, and which I have organized into four sections of ten best posts by topic or type.

 

Ten Best Posts on Big Data

  • Dot Collectors and Dot Connectors — The multifaceted challenges of big data require the dot collectors of data management and the dot connectors of business intelligence to overcome their attention blindness and work together more collaboratively.
  • HoardaBytes and the Big Data Lebowski — Don’t hoard Data, dude.  The Data must abide.  The Data must abide both the Business, by proving useful to our business activities, and the Individual, by protecting the privacy of our personal activities.
  • Our Increasingly Data-Constructed World — What we now call Big Data is in fact a long-running macro trend underlying the many recent trends and innovations making our world, not just more data-driven, but increasingly data-constructed.
  • Will Big Data be Blinded by Data Science? — With apologies to Thomas Dolby, will the business leaders being told to hire data scientists to derive business value from big data analytics be blind to what data science tries to show them?
  • The Graystone Effects of Big Data — Using a metaphor based on the science fiction television show Caprica, I refer to the positive aspects of Big Data as the Zoe Graystone Effect, and the negative aspects of Big Data as the Daniel Graystone Effect.
  • Exercise Better Data Management — Big Data may be followed by MOData (i.e., MOre Data or Morbidly Obese Data), but that doesn’t necessarily mean we require more data management, instead we just need to exercise better data management.
  • A Tale of Two Datas — Inspired by Malcolm Chisholm and Charles Dickens, there are two types of data (i.e., representation and observation, not big and not-so-big) with different data uses that will require different data management approaches.
  • Data Silence — Not only do we need to adopt a mindset that embraces the principles of data science, but we also have to acknowledge that the biases and preconceptions in our minds could silence the signal and amplify the noise in big data.
  • The Wisdom of Crowds, Friends, and Experts — The future of wisdom will increasingly become an amalgamation of experts, friends, and crowds, with the data and techniques from all three sources often contributing to data-driven decision making.

 

Ten Best Posts on Data Governance and Data Quality

  • Data Quality: Quo Vadimus? — With lots of help from Henrik Liliendahl Sørensen, Garry Ure, Bryan Larkin, and many others via the comments, I ponder where data quality is going, and whether data quality is a journey or a destination.
  • Data Quality and Miracle Exceptions — Battling the dark forces of poor data quality doesn’t require any superpowers, and data quality doesn’t have any miracle exceptions, so for the love of high-quality data everywhere, stop trying to sell us one.
  • Data Myopia and Business Relativity — Examines the two most prevalent definitions for data quality, real-world alignment and fitness for the purpose of use, otherwise known as the danger of data myopia and the challenge of business relativity.
  • How Data Cleansing Saves Lives — Although proactive defect prevention is far superior to reactive data cleansing, the history of the Hubble Space Telescope proves that data cleansing can be not just a necessary evil, but also a necessary good.
  • Data Quality and the Bystander Effect — The most common reason data quality issues are neither reported nor corrected is the Bystander Effect making people less likely to interpret bad data as a problem or, at the very least, not their responsibility.
  • Data Quality and Chicken Little Syndrome — A chicken-metaphor-based post about the far-too-common and fowl folly of, instead of trying to sell the business benefits of data quality, emphasizing the negative aspects of not investing in data quality.
  • Data and its Relationships with Quality — The metadata linking the data management industry to what it manages suffers from the one-to-many relationships created by never agreeing on how data, information, and quality should be defined.
  • Cooks, Chefs, and Data Governance — Implementing policies requires cooks who are adept at carrying out a recipe, as well as chefs who are trusted to figure out how to best combine policies with the organizational ingredients available to them.
  • Availability Bias and Data Quality Improvement — The availability heuristic explains why a reactive data cleansing project is easily approved, and availability bias explains why initiating a proactive data quality program is usually resisted.

 

Ten Best Podcasts

  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Saving Private Data — Recorded in December 2011, guest Daragh O Brien discusses the data privacy and data protection implications of social media, cloud computing, and big data.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Defining Big Data — This episode of the Open MIKE Podcast, with assistance from Robert Hillard, discusses how big data refers to big complexity, not big volume, even though complex datasets tend to grow rapidly, thus making them voluminous.
  • Getting to Know NoSQL — This episode of the Open MIKE Podcast discusses how NoSQL does not mean AntiSQL (i.e., NoSQL is not a Relational replacement), and that business-driven big data needs will often require “Not Only SQL.”

 

Ten Best of the Rest

  • DQ-View: Data Is as Data Does — In this short video, I explain that data’s value comes from data’s usefulness, exemplifying the potential value of unstructured data based on whether or not you put what you read in data management books to use.
  • DQ-View: The Five Stages of Data Quality — In this short video, using my superb acting skills, I demonstrate how coming to terms with the daunting challenge of data quality is somewhat similar to experiencing the Five Stages of Grief.
  • DQ-View: MetaData makes BettahMusic — In this short video, I demonstrate how better metadata makes data better using the metadata automatically and manually created after importing my CD collection into my iTunes library.
  • Metadata, Data Quality, and the Stroop Test — In this colorful (and perhaps too colorful) post, I use the Stroop Test, where colors do not match their names, to discuss the relationship between metadata and data quality.
  • Quality is the Higgs Field of Data — Using one of the biggest science stories of 2012, the potential discovery of the elusive Higgs Boson (which I also attempt to explain), I attempt an analogy for data quality based on the Higgs Field.
  • The Family Circus and Data Quality — Thanks to The Family Circus comic strip created by cartoonist Bil Keane, I explain how Ida Know owns the data, Not Me is accountable for data governance, and Nobody takes responsibility for data quality.
  • Data Love Song Mashup — Since your data needs love too, on Valentine’s Day I wrote this post providing a mashup of love songs for your data (and Rob DuMoulin added a few more in the comments) — Happy Data Quality to you and your data!
  • The Algebra of Collaboration — The trick of algebra equates collaboration with data quality and data governance success when collaboration is viewed not just as a guiding principle, but also as a call to action in your daily practices.
  • The Return of the Dumb Terminal — With help from author Kevin Kelly and my old green machine, I ponder how the mobile-app-portal-to-the-cloud computing model means mobile devices are bringing about the return of the dumb terminal.
  • An Enterprise Carol — Jacob Marley raises the ghosts of a few ideas to consider about how to keep the Enterprise well in the new year via the Ghosts of Enterprise Past (Legacy Applications), Present (IT Consumerization), and Future (Big Data).

 

Thank You for Reading OCDQ Blog in 2012

In 2012, the Obsessive-Compulsive Data Quality (OCDQ) blog published 92 posts, which received 160,000 total page views, while averaging over 400 page views and 200 unique visitors a day.

Thank you for reading OCDQ Blog in 2012.  Your readership was deeply appreciated.

 

Related Posts

Best OCDQ Blog Posts of 2011

So Long 2011, and Thanks for All the . . . – The OCDQ Radio 2011 Year in Review

2012 Quarterly Review of the Data Roundtable (Part 4)

2012 Quarterly Review of the Data Roundtable (Part 3)

2012 Quarterly Review of the Data Roundtable (Part 2)

2012 Quarterly Review of the Data Roundtable (Part 1)

2011 Quarterly Review of the Data Roundtable (Part 4)

2011 Quarterly Review of the Data Roundtable (Part 3)

2011 Quarterly Review of the Data Roundtable (Part 2)

2011 Quarterly Review of the Data Roundtable (Part 1)

Thursday
Oct042012

A Tale of Two Datas

Is big data more than just lots and lots of data?  Is big data unstructured and not-so-big data structured?  Malcolm Chisholm explored these questions in his recent Information Management column, where he posited that there are, in fact, two datas.

“One type of data,” Chisholm explained,  “represents non-material entities in vast computerized ecosystems that humans create and manage.  The other data consists of observations of events, which may concern material or non-material entities.”

Providing an example of the first type, Chisholm explained, “my bank account is not a physical thing at all; it is essentially an agreed upon idea between myself, the bank, the legal system, and the regulatory authorities.  It only exists insofar as it is represented, and it is represented in data.  The balance in my bank account is not some estimate with a positive and negative tolerance; it is exact.  The non-material entities of the financial sector are orderly human constructs.  Because they are orderly, we can more easily manage them in computerized environments.”

The orderly human constructs that are represented in data, in the stories told by data (including the stories data tell about us and the stories we tell data) is one of my favorite topics.  In our increasingly data-constructed world, it’s important to occasionally remind ourselves that data and the real world are not the same thing, especially when data represents non-material entities since, with the possible exception of Makers using 3-D printers, data-represented entities do not re-materialize into the real world.

Describing the second type, Chisholm explained, “a measurement is usually a comparison of a characteristic using some criteria, a count of certain instances, or the comparison of two characteristics.  A measurement can generally be quantified, although sometimes it’s expressed in a qualitative manner.  I think that big data goes beyond mere measurement, to observations.”

Chisholm called the first type the Data of Representation, and the second type the Data of Observation.

The data of representation tends to be structured, in the relational sense, but doesn’t need to be (e.g., graph databases) and the data of observation tends to be unstructured, but it can also be structured (e.g., the structured observations generated by either a data profiling tool analyzing structured relational tables or flat files, or a word-counting algorithm analyzing unstructured text).

Structured and unstructured,” Chisholm concluded, “describe form, not essence, and I suggest that representation and observation describe the essences of the two datas.  I would also submit that both datas need different data management approaches.  We have a good idea what these are for the data of representation, but much less so for the data of observation.”

I agree that there are two types of data (i.e., representation and observation, not big and not-so-big) and that different data uses will require different data management approaches.  Although data modeling is still important and data quality still matters, how much data modeling and data quality is needed before data can be effectively used for specific business purposes will vary.

In order to move our discussions forward regarding “big data” and its data management and business intelligence challenges, we have to stop fiercely defending our traditional perspectives about structure and quality in order to effectively manage both the form and essence of the two datas.  We also have to stop fiercely defending our traditional perspectives about data analytics, since there will be some data use cases where depth and detailed analysis may not be necessary to provide business insight.

 

A Tale of Two Datas

In conclusion, and with apologies to Charles Dickens and his A Tale of Two Cities, I offer the following A Tale of Two Datas:

It was the best of times, it was the worst of times.
It was the age of Structured Data, it was the age of Unstructured Data.
It was the epoch of SQL, it was the epoch of NoSQL.
It was the season of Representation, it was the season of Observation.
It was the spring of Big Data Myth, it was the winter of Big Data Reality.
We had everything before us, we had nothing before us,
We were all going direct to hoarding data, we were all going direct the other way.
In short, the period was so far like the present period, that some of its noisiest authorities insisted on its being signaled, for Big Data or for not-so-big data, in the superlative degree of comparison only.

Related Posts

HoardaBytes and the Big Data Lebowski

The Idea of Order in Data

The Most August Imagination

Song of My Data

The Lies We Tell Data

Our Increasingly Data-Constructed World

Plato’s Data

OCDQ Radio - Demystifying Master Data Management

OCDQ Radio - Data Quality and Big Data

Big Data: Structure and Quality

Swimming in Big Data

Sometimes it’s Okay to be Shallow

Darth Vader, Big Data, and Predictive Analytics

The Big Data Theory

Finding a Needle in a Needle Stack

Exercise Better Data Management

Magic Elephants, Data Psychics, and Invisible Gorillas

Why Can’t We Predict the Weather?

Data and its Relationships with Quality

A Tale of Two Q’s

A Tale of Two G’s

Tuesday
Sep182012

Turning the M Upside Down

I am often asked about the critical success factors for enterprise initiatives, such as data quality, master data management, and data governance.

Although there is no one thing that can guarantee success, if forced to choose one critical success factor to rule them all, I would choose collaboration.

But, of course, when I say this everyone rolls their eyes at me (yes, I can see you doing it now through the computer) since it sounds like I’m avoiding the complex concepts underlying enterprise initiatives by choosing collaboration.

The importance of collaboration is a very simple concept but, as Amy Ray and Emily Saliers taught me, “the hardest to learn was the least complicated.”

 

The Pronoun Test

Although all organizations must define the success of enterprise initiatives in business terms (e.g., mitigated risks, reduced costs, or increased revenue), collaborative organizations understand that the most important factor for enduring business success is the willingness of people all across the enterprise to mutually pledge to each other their communication, cooperation, and trust.

These organizations pass what Robert Reich calls the Pronoun Test.  When their employees make references to the company, it’s done with the pronoun We and not They.  The latter suggests at least some amount of disengagement, and perhaps even alienation, whereas the former suggests the opposite — employees feel like part of something significant and meaningful.

An even more basic form of the Pronoun Test is whether or not people can look beyond their too often self-centered motivations and selflessly include themselves in a collaborative effort.  “It’s amazing how much can be accomplished if no one cares who gets the credit” is an old quote for which, with an appropriate irony, it is rather difficult to identify the original source.

Collaboration requires a simple, but powerful, paradigm shift that I call Turning the M Upside Down — turning Me into We.

 

Related Posts

The Algebra of Collaboration

The Business versus IT—Tear down this wall!

The Road of Collaboration

Dot Collectors and Dot Connectors

No Datum is an Island of Serendip

The Three Most Important Letters in Data Governance

The Stakeholder’s Dilemma

Shining a Social Light on Data Quality

Data Quality and the Bystander Effect

The Family Circus and Data Quality

The Year of the Datechnibus

Being Horizontally Vertical

The Collaborative Culture of Data Governance

Collaboration isn’t Brain Surgery

Are you Building Bridges or Digging Moats?

Tuesday
Aug212012

Data and its Relationships with Quality

The title of this blog post is an allusion to the graphic (shown above) that accompanied the recent Information Management column by Malcolm Chisholm, in which he wrote that data quality is not fitness for use as it is most commonly defined, stating he thinks “a strong case can be made that the definition is indeed inappropriate and should be replaced with a better one.”

“Before we get into the definition of data quality, let us take a brief look at what data is related to,” Chisholm opened, explaining that “data represents something — a thing, event, or concept.”

As I blogged in my post Plato’s Data, whether it’s an abstract description of real-world entities (i.e., “master data”) or an abstract description of real-world interactions (i.e., “transaction data”) among entities, data is an abstract description of reality.  Although data shapes our perception of the real world, sometimes we forget that data is only a partial reflection of reality.

“Data is understood,” Chisholm continued, “by something, for which the best term I can find is the interpretant.”

“The interpretant applies the data to one or more uses, which achieve objectives the interpretant has.  The interpretant is independent of the data.  It understands the data and can put it to use.  But if the interpretant misunderstands the data, or puts it to an inappropriate use, that is hardly the fault of the data, and cannot constitute a data quality problem.”

As I blogged in my post Quality is the Higgs Field of Data, independent from use, data is as carefree as the mass-less photon whizzing around at the speed of light.  But once we interact with it, data begins to feel the effects of our use. We give data mass so that it can become the basic building blocks of what matters to us.  Some data is affected more by our use than others.  The more subjective our use, the more we weigh data down.  The more objective our use, the less we weigh data down.

“A more fundamental problem is that data can have many uses,” Chisholm continued.  “If we think data quality is fitness for use, then data quality must be assessed independently for each use we put it to.”  Instead, Chisholm contends that data quality is “an expression of the relationship between the thing, event, or concept and the data that represents it.  This is a one-to-one relationship, unlike the one-to-many relationship between data and uses.”

Therefore, Chisholm proposes that a better definition of data quality is “the extent to which the data actually represents what it purports to represent.  This definition can be used to think of data quality as a property of the data itself, and then our diagnosis and remediation efforts will focus on the special problems of the relationship between data and what it represents.”

But, of course, although Chisholm doesn’t like it as a definition for data quality, he is not denying that fitness for use describes “a set of valid concepts that deal with types of problems around the use of data.”  Two examples he cites are when the interpretant misunderstands the data, or when the interpretant uses data for a purpose that is incompatible with the data.

In his conclusion, Chisholm states that “the special problems of the relationships between data and what it is used for requires a different set of approaches and should be called something other than data quality.”

And this is exactly why, as I blogged in my post Data Myopia and Business Relativity, many data professionals prefer to define data quality as real-world alignment and information quality as fitness for the purpose of use.  However, I have found that adding the nuance of data versus information only further complicates data quality discussions with business professionals.

Chisholm also suggests that his proposed definition of data quality is not only better, but that “it also alludes to the existence of metadata that links the data to what it is representing.”  The important role that metadata plays in supporting data and its relationships with information and quality is something I blogged about in my post You Say Potato and I Say Tater Tot.

The irony is the metadata that links the data management industry to what it is representing that it manages suffers from the one-to-many relationships we’ve created by seemingly never agreeing on how data, information, and quality should be defined.

 

Related Posts

Plato’s Data

Quality is the Higgs Field of Data

Data Myopia and Business Relativity

Data, Information, and Knowledge Management

You Say Potato and I Say Tater Tot

Metadata, Data Quality, and the Stroop Test

Data Quality and the Q Test

Data Quality and Miracle Exceptions

Data Quality and Chicken Little Syndrome

Data Quality: Quo Vadimus?

DQ-View: The Five Stages of Data Quality

Exercise Better Data Management

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Redefining Data Quality — Guest Peter Perera discusses his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Wednesday
Aug012012

Exercise Better Data Management

Recently on Twitter, Daragh O Brien and I discussed his proposed concept.  “After Big Data,” Daragh tweeted, “we will inevitably begin to see the rise of MOData as organizations seek to grab larger chunks of data and digest it.  What is MOData?  It’s MO’Data, as in MOre Data. Or Morbidly Obese Data.  Only good data quality and data governance will determine which.”

Daragh asked if MO’Data will be the Big Data Killer.  I said only if MO’Data doesn’t include MO’BusinessInsight, MO’DataQuality, and MO’DataPrivacy (i.e., more business insight, more data quality, and more data privacy).

“But MO’Data is about more than just More Data,” Daragh replied.  “It’s about avoiding Morbidly Obese Data that clogs data insight and data quality, etc.”

I responded that More Data becomes Morbidly Obese Data only if we don’t exercise better data management practices.

Agreeing with that point, Daragh replied, “Bring on MOData and the Pilates of Data Quality and Data Governance.”

To slightly paraphrase lines from one of my favorite movies — Airplane! — the Cloud is getting thicker and the Data is getting laaaaarrrrrger.  Surely I know well that growing data volumes is a serious issue — but don’t call me Shirley.

Whether you choose to measure it in terabytes, petabytes, exabytes, HoardaBytes, or how much reality bites, the truth is we were consuming way more than our recommended daily allowance of data long before the data management industry took a tip from McDonald’s and put the word “big” in front of its signature sandwich.  (Oh great . . . now I’m actually hungry for a Big Mac.)

But nowadays with silos replicating data, as well as new data, and new types of data, being created and stored on a daily basis, our data is resembling the size of Bob Parr in retirement, making it seem like not even Mr. Incredible in his prime possessed the super strength needed to manage all of our data.  Those were references to the movie The Incredibles, where Mr. Incredible was a superhero who, after retiring into civilian life under the alias of Bob Parr, elicits the observation from this superhero costume tailor: “My God, you’ve gotten fat.”  Yes, I admit not even Helen Parr (aka Elastigirl) could stretch that far for a big data joke.

 

A Healthier Approach to Big Data

Although Daragh’s concerns about morbidly obese data are valid, no superpowers (or other miracle exceptions) are needed to manage all of our data.  In fact, it’s precisely when we are so busy trying to manage all of our data that we hoard countless bytes of data without evaluating data usage, gathering data requirements, or planning for data archival.  It’s like we are trying to lose weight by eating more and exercising less, i.e., consuming more data and exercising less data quality and data governance.  As Daragh said, only good data quality and data governance will determine whether we get more data or morbidly obese data.

Losing weight requires a healthy approach to both diet and exercise.  A healthy approach to diet includes carefully choosing the food you consume and carefully controlling your portion size.  A healthy approach to exercise includes a commitment to exercise on a regular basis at a sufficient intensity level without going overboard by spending several hours a day, every day, at the gym.

Swimming is a great form of exercise, but swimming in big data without having a clear business objective before you jump into the pool is like telling your boss that you didn’t get any work done because you decided to spend all day working out at the gym.

Carefully choosing the data you consume and carefully controlling your data portion size is becoming increasingly important since big data is forcing us to revisit information overload.  However, the main reason that traditional data management practices often become overwhelmed by big data is because traditional data management practices are not always the right approach.

We need to acknowledge that some big data use cases differ considerably from traditional ones.  Data modeling is still important and data quality still matters, but how much data modeling and data quality is needed before big data can be effectively used for business purposes will vary.  In order to move the big data discussion forward, we have to stop fiercely defending our traditional perspectives about structure and quality.  We also have to stop fiercely defending our traditional perspectives about analytics, since there will be some big data use cases where depth and detailed analysis may not be necessary to provide business insight.

 

Better than Big or More

Jim Ericson explained that your data is big enough.  Rich Murnane explained that bigger isn’t better, better is better.  Although big data may indeed be followed by more data that doesn’t necessarily mean we require more data management in order to prevent more data from becoming morbidly obese data.  I think that we just need to exercise better data management.

 

Related Posts

OCDQ Radio - Saving Private Data

OCDQ Radio - The Blue Box of Information Quality

Quality is the Higgs Field of Data

Are you turning Ugly Data into Cute Information?

Big Data Lessons from Orbitz

The Graystone Effects of Big Data

Will Big Data be Blinded by Data Science?

Our Increasingly Data-Constructed World

OCDQ Radio - Data Quality and Big Data

Magic Elephants, Data Psychics, and Invisible Gorillas

HoardaBytes and the Big Data Lebowski

Big Data el Memorioso

Information Overload Revisited

The Big Data Collider

OCDQ Radio - Big Data and Big Analytics

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Sometimes it’s Okay to be Shallow

Big Data: Structure and Quality

The Big Data Theory

Swimming in Big Data

Why Can’t We Predict the Weather?

Tuesday
Jul242012

Demystifying Master Data Management

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, special guest John Owens and I attempt to demystify master data management (MDM) by explaining the three types of data (Transaction, Domain, Master) and the four master data entities (Party, Product, Location, Asset), as well as, and perhaps the most important concept of all, the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

John Owens is a thought leader, consultant, mentor, and writer in the worlds of business and data modelling, data quality, and master data management (MDM).  He has built an international reputation as a highly innovative specialist in these areas and has worked in and led multi-million dollar projects in a wide range of industries around the world.

John Owens has a gift for identifying the underlying simplicity in any enterprise, even when shrouded in complexity, and bringing it to the surface.  He is the creator of the Integrated Modelling Method (IMM), which is used by business and data analysts around the world.  Later this year, John Owens will be formally launching the IMM Academy, which will provide high quality resources, training, and mentoring for business and data analysts at all levels.

You can also follow John Owens on Twitter and connect with John Owens on Linkedin.  And if you’re looking for a MDM course, consider the online course from John Owens, which you can find by clicking on this link: MDM Online Course (Affiliate Link)

 

Demystifying Master Data Management

Additional listening options:

 

Related Posts

Choosing Your First Master Data Domain

Lycanthropy, Silver Bullets, and Master Data Management

Voyage of the Golden Records

The Quest for the Golden Copy (Part 1)

The Quest for the Golden Copy (Part 2)

The Quest for the Golden Copy (Part 3)

The Quest for the Golden Copy (Part 4)

How Social can MDM get?

Will Social MDM be the New Spam?

More Thoughts about Social MDM

Is Social MDM going the Wrong Way?

The Semantic Future of MDM

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Redefining Data Quality — Guest Peter Perera discusses his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

Tuesday
Mar202012

Data Myopia and Business Relativity

Since how data quality is defined has a significant impact on how data quality is perceived, measured, and managed, in this post I examine the two most prevalent perspectives on defining data quality, real-world alignment and fitness for the purpose of use, which respectively represent what I refer to as the danger of data myopia and the challenge of business relativity.

 

Real-World Alignment: The Danger of Data Myopia

Whether it’s an abstract description of real-world entities (i.e., master data) or an abstract description of real-world interactions (i.e., transaction data) among entities, data is an abstract description of reality.  The creation and maintenance of these abstract descriptions shapes the organization’s perception of the real world, which I philosophically pondered in my post Plato’s Data.

The inconvenient truth is that the real world is not the same thing as the digital worlds captured within our databases.

And, of course, creating and maintaining these digital worlds is no easy task, which is exactly the danger inherent with the real-world alignment definition of data quality — when the organization’s data quality efforts are focused on minimizing the digital distance between data and the constantly changing real world that data attempts to describe, it can lead to a hyper-focus on the data in isolation, otherwise known as data myopia.

Even if we create and maintain perfect real-world alignment, what value does high-quality data possess independent of its use?

Real-world alignment reflects the perspective of the data provider, and its advocates argue that providing a trusted source of data to the organization will be able to satisfy any and all business requirements, i.e., high-quality data should be fit to serve as the basis for every possible use.  Therefore, in theory, real-world alignment provides an objective data foundation independent of the subjective uses defined by the organization’s many data consumers.

However, providing the organization with a single system of record, a single version of the truth, a single view, a golden copy, or a consolidated repository of trusted data has long been the rallying cry and siren song of enterprise data warehousing (EDW), and more recently, of master data management (MDM).  Although these initiatives can provide significant business value, it is usually poor data quality that undermines the long-term success and sustainability of EDW and MDM implementations.

Perhaps the enterprise needs a Ulysses pact to protect it from believing in EDW or MDM as a miracle exception for data quality?

A significant challenge for the data provider perspective on data quality is that it is difficult to make a compelling business case on the basis of trusted data without direct connections to the specific business needs of data consumers, whose business, data, and technical requirements are often in conflict with one another.

In other words, real-world alignment does not necessarily guarantee business-world alignment.

So, if using real-world alignment as the definition of data quality has inherent dangers, we might be tempted to conclude that the fitness for the purpose of use definition of data quality is the better choice.  Unfortunately, that is not necessarily the case.

 

Fitness for the Purpose of Use: The Challenge of Business Relativity

In M. C. Escher’s famous 1953 lithograph Relativity, although we observe multiple, and conflicting, perspectives of reality, from the individual perspective of each person, everything must appear normal, since they are all casually going about their daily activities.

I have always thought this is an apt analogy for the multiple business perspectives on data quality that exists within every organization.

Like truth, beauty, and art, data quality can be said to be in the eyes of the beholder, or when data quality is defined as fitness for the purpose of use — the eyes of the user.

Most data has both multiple uses and users.  Data of sufficient quality for one use or user may not be of sufficient quality for other uses and users.  These multiple, and often conflicting, perspectives are considered irrelevant from the perspective of an individual user, who just needs quality data to support their own business activities.

Therefore, the user (i.e., data consumer) perspective establishes a relative business context for data quality.

Whereas the real-world alignment definition of data quality can cause a data-myopic focus, the business-world alignment goal of the fitness for the purpose of use definition must contend with the daunting challenge of business relativity.  Most data has multiple data consumers, each with their own relative business context for data quality, making it difficult to balance the diverse data needs and divergent data quality perspectives within the conflicting, and rather Escher-like, reality of the organization.

The data consumer perspective on data quality is often the root cause of the data silo problem, the bane of successful enterprise data management prevalent in most organizations, where each data consumer maintains their own data silo, customized to be fit for the purpose of their own use.  Organizational culture and politics also play significant roles since data consumers legitimately fear that losing their data silos would revert the organization to a one-size-fits-all data provider perspective on data quality.

So, clearly the fitness for the purpose of use definition of data quality is not without its own considerable challenges to overcome.

 

How does your organization define data quality?

As I stated at the beginning of this post, how data quality is defined has a significant impact on how data quality is perceived, measured, and managed.  I have witnessed the data quality efforts of an organization struggle with, and at times fail because of, either the danger of data myopia or the challenge of business relativity — or, more often than not, some combination of both.

Although some would define real-world alignment as data quality and fitness for the purpose of use as information quality, I have found adding the nuance of data versus information only further complicates an organization’s data quality discussions.

But for now, I will just conclude a rather long (sorry about that) post by asking for reader feedback on this perennial debate.

How does your organization define data quality?  Please share your thoughts and experiences by posting a comment below.

 

Related Posts

Data Quality: Quo Vadimus?

Data Quality and Miracle Exceptions

Plato’s Data

Once Upon a Time in the Data

The Most August Imagination

Data in the (Oscar) Wilde

The Idea of Order in Data

Hell is other people’s data

Song of My Data

You Say Potato and I Say Tater Tot

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Redefining Data Quality — Guest Peter Perera discusses his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.
  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Monday
Mar052012

Data Driven

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

This is Part 1 of 2 from my recent discussion with Tom Redman.  In this episode, Tom and I discuss concepts from one of my favorite data quality books, which is his most recent book: Data Driven: Profiting from Your Most Important Business Asset.

Our discussion includes viewing data as an asset, an organization’s hierarchy of data needs, a simple model for culture change, and attempting to achieve the “single version of the truth” being marketed as a goal of master data management (MDM).

Dr. Thomas C. Redman (the “Data Doc”) is an innovator, advisor, and teacher.  He was first to extend quality principles to data and information in the late 80s.  Since then he has crystallized a body of tools, techniques, roadmaps and organizational insights that help organizations make order-of-magnitude improvements.

More recently Tom has developed keen insights into the nature of data and formulated the first comprehensive approach to “putting data to work.”  Taken together, these enable organizations to treat data as assets of virtually unlimited potential.

Tom has personally helped dozens of leaders and organizations better understand data and data quality and start their data programs.  He is a sought-after lecturer and the author of dozens of papers and four books.

Prior to forming Navesink Consulting Group in 1996, Tom conceived the Data Quality Lab at AT&T Bell Laboratories in 1987 and led it until 1995. Tom holds a Ph.D. in statistics from Florida State University.  He holds two patents.

 

Data Driven

Additional listening options:

 

Win a copy of the Book

Tom Redman wants to give one OCDQ Radio listener a free copy of Data Driven: Profiting from Your Most Important Business Asset

Here is how the book contest will work:

(1) Book Contest Question — Name at least one of the five aspects of the hierarchy of data and information needs that was described by Tom Redman during this OCDQ Radio episode.

 

(2) Book Contest Deadline — By or before March 31, 2012, Email Jim Harris with your answer to the book contest question.

 

(3) Book Contest Winner — In April 2012, one winner will be randomly selected from the emails containing the correct answer to the contest question, and Tom Redman (or his publisher) will email the winner requesting a shipping address for the book.

 

 

Related Posts

A Farscape Analogy for Data Quality

The Data Quality Wager

DQ-View: Data Is as Data Does

Common Change

DQ-View: Talking about Data

Hailing Frequencies Open

Beyond a “Single Version of the Truth”

DQ-Tip: “Don't pass bad data on to the next person...”

Hyperactive Data Quality (Second Edition)

Data Quality: Quo Vadimus?

Data Quality and Miracle Exceptions

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Thursday
Jan262012

The Johari Window of Data Quality

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

The Johari Window is a term from psychology for a technique used to help people better understand their personality and behavior by combining a self assessment with assessments from their peers.  In relation to data, the Johari Window is a metaphor for helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

During this episode, I discuss the Johari Window of Data Quality with Martin Doyle.  Our discussion, inspired by our blog comment banter on my post There is No Such Thing as a Root Cause, includes root cause analysis, the pursuit of data perfection, metadata, communication, Business-IT collaboration, change management, defect prevention, and continuous improvement.

Martin Doyle is a Data Quality Improvement Evangelist and the CEO of DQ Global, which is a UK-based data quality software and services vendor providing data cleansing, international address and email verification, data deduplication, and data matching solutions for Customer Relationship Management, Single Customer View, and Master Data Management.  DQ Global has worked with over 500 businesses worldwide on a variety of projects, providing their clients with improved data quality, making their data fit for business use, and enabling them to trust their data and make decisions based on a foundation of fact.

 

The Johari Window of Data Quality

Additional listening options:

 

Related Posts

There is No Such Thing as a Root Cause

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

DQ-View: The Cassandra Effect

The Data Quality Wager

DQ-View: Data Is as Data Does

Selling the Business Benefits of Data Quality

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Friday
Jan132012

Scary Calendar Effects

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, recorded on the first of three occurrences of Friday the 13th in 2012, I discuss scary calendar effects.

In other words, I discuss how schedules, deadlines, and other date-related aspects can negatively affect enterprise initiatives such as data quality, master data management, and data governance.

Please Beware: This episode concludes with the OCDQ Radio Theater production of Data Quality and Friday the 13th.

 

Scary Calendar Effects

Additional listening options:

 

Related Posts

Data Quality and #FollowFriday the 13th

The Moirae, Deadlines and Working within Limits

The Fiscal Calendar Effect

Eternal September and Tacit Knowledge

“What is is the was of what shall be”

 

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

 

Tuesday
Jan032012

Best OCDQ Blog Posts of 2011

Welcome to my roundup of the best blog posts published on the Obsessive-Compulsive Data Quality (OCDQ) blog during 2011.

My selections were based on a pseudo-scientific, quasi-statistical combination of page views, comments, and re-tweets (as well as choosing a few of my personal favorites).  Instead of ordering the posts chronologically, I decided to organize them by theme.

 

The Metadata Trilogy

Although it has an incredibly important role to play in data quality and its related disciplines, I don’t write about metadata very often.  But the reader feedback that I received lead me to writing three blog posts about metadata in the span of a few weeks:

  • The Metadata Crisis — There is a running debate within many organizations over the meaning of commonly used terms, which complicates what on the surface seem like straightforward business questions.
  • The Metadata Continuum — There is a continuum, where at one end we have the uniformity of controlled vocabularies, and at the other end we have the flexibility of chaotic folksonomies.  However, both flexibility and uniformity provide value.
  • You Say Potato and I Say Tater Tot — The demarcations of the borders between metadata, data, and information are important, but sometimes difficult to discern.  In this post, I offer an explanation about these demarcations using potatoes.

 

The Data Governance Star Wars (one less than a) Trilogy

In June, Rob Karel of Forrester Research and I used a Star Wars themed blog mock debate to take on one of data governance’s biggest challenges — how to balance bureaucracy and business agility.  Gwen Thomas of the Data Governance Institute joined Rob and I to continue the discussion during a special, extended, and Star Wars themed episode of OCDQ Radio:

  • Data Governance Star Wars on OCDQ Radio — In Part 1, Rob Karel and I discuss our blog mock debate, which is followed by a brief Star Wars themed intermission, and then in Part 2, Gwen Thomas joins us to provide her excellent insights.

 

Although not Star Wars themed, here are some additional Best OCDQ Blog Posts of 2011 on the topic of data governance:

  • Data Governance and the Adjacent Possible — It’s important to demonstrate that some data governance policies reflect existing best practices, which helps reduce resistance to change, and therefore I advise: “If it ain’t broke, bricolage it.”
  • Aristotle, Data Governance, and Lead Rulers — Well-constructed data governance policies are like lead rulers — flexible rules that empower us with an understanding of the principle of the policy, and how to enforce it in a particular context.
  • The Stakeholder’s Dilemma — There will be times when sacrifices for the long-term greater good will require that stakeholders either contribute more resources during the current phase, or receive fewer benefits from its deliverables.
  • Beware the Data Governance Ides of March — My dramatized warning about relying too much on the top-down approach to implementing data governance — and especially if your organization has any data stewards named Brutus or Cassius.

 

OCDQ Radio

In June, I launched OCDQ Radio, which is a vendor-neutral podcast about data quality and the audio complement to this blog, providing me with a platform for recorded discussions with the great folks working in the data management industry.  So far, there have been 21 episodes of OCDQ Radio, including 22 guests from 7 countries.  Here are a few of the most popular episodes:

  • The Fall Back Recap Show — A look back at the Best of OCDQ Radio, including discussions about Data, Information, Business-IT Collaboration, Change Management, Big Analytics, Data Governance, and the Data Revolution.
  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.
  • Social Media Strategy — Guest Crysta Anderson of IBM Initiate explains social media strategy and content marketing, including three recommended practices: (1) Listen intently, (2) Communicate succinctly, and (3) Have fun.

 

The Best of the Rest

  • DQ-View: Talking about DataDQ-View video discussion about how data professionals should talk about data when invited to participate in business discussions within their organizations.
  • The Speed of Decision — Examines the constraints that time puts on data-driven decision making, pondering whether decision speed is more important than data quality and decision quality.
  • The Data Cold War — Examines how Google and Facebook have performed the Master Data Management Magic Trick and socialized data (“Information wants to be free!”) in order to capitalize data as a true corporate asset.
  • A Farscape Analogy for Data Quality — Ponders whether data is not viewed as an asset because data has so thoroughly pervaded the enterprise that data has become invisible to those who are so dependent upon its quality.
  • No Datum is an Island of Serendip — Our organizations need to create collaborative environments that foster serendipitous connections bringing all of our business units and people together around our shared data assets.

 

Thank You for Reading OCDQ Blog in 2011

In 2011, the Obsessive-Compulsive Data Quality (OCDQ) blog published 112 posts, which received 130,000 total page views, averaging 350 page views and 150 unique visitors a day.

Thank you for reading OCDQ Blog in 2011.  Your readership was deeply appreciated.

 

Related Posts

So Long 2011, and Thanks for All the . . . – The OCDQ Radio 2011 Year in Review

2011 Quarterly Review of the Data Roundtable (Part 3)

2011 Quarterly Review of the Data Roundtable (Part 2)

2011 Quarterly Review of the Data Roundtable (Part 1)

Commendable Comments (Part 10) – The 300th OCDQ Blog Post

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

OCDQ Blog Bicentennial – The 200th OCDQ Blog Post

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

The Best Data Quality Blog Posts of 2010

Thursday
Dec292011

So Long 2011, and Thanks for All the . . .

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Don’t Panic!  Welcome to the mostly harmless OCDQ Radio 2011 Year in Review episode.  During this approximately 42 minute episode, I recap the data-related highlights of 2011 in a series of sometimes serious, sometimes funny, segments, as well as make wacky and wildly inaccurate data-related predictions about 2012.

Special thanks to my guests Jarrett Goldfedder, who discusses Big Data, Nicola Askham, who discusses Data Governance, and Daragh O Brien, who discusses Data Privacy.  Additional thanks to Rich Murnane and Dylan Jones.  And Deep Thanks to that frood Douglas Adams, who always knew where his towel was, and who wrote The Hitchhiker’s Guide to the Galaxy.

 

So Long 2011, and Thanks for All the . . .

Additional listening options:

 

Previous OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Monday
Dec192011

Redefining Data Quality

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, I have an occasionally spirited discussion about data quality with Peter Perera, partially precipitated by his provocative post from this past summer, The End of Data Quality...as we know it, which included his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.

Peter Perera is a recognized consultant and thought leader with significant experience in Master Data Management, Customer Relationship Management, Data Quality, and Customer Data Integration.  For over 20 years, he has been advising and working with Global 5000 organizations and mid-size enterprises to increase the usability and value of their customer information.

 

Redefining Data Quality

Additional listening options:

 

Related Posts

You Say Potato and I Say Tater Tot

You only get a Return from something you actually Invest in

Listen to John Ladley discuss why Data and Information are Enterprise Assets on OCDQ Radio

Listen to Daragh O Brien discuss Data and Information Quality on OCDQ Radio

Listen to Gordon Hamilton discuss the Information Product on OCDQ Radio

Listen to Peter Benson discuss Metadata, Data, and Information on the Knights of the Data Roundtable

Plato’s Data

Data, Information, and Knowledge Management

The Data-Information Continuum

The First Law of Data Quality

Friday
Nov112011

You Say Potato and I Say Tater Tot

One thread of the comment discussion on my blog post The Metadata Continuum raised the excellent point that the demarcation of the border between data and metadata is important, but sometimes difficult to discern.  By extension, we can say the same thing about the demarcation of the border between data and information.

So, in this blog post, I thought I would try to offer an explanation about the importance of these demarcations using potatoes.

 

You Say Potato and I Say Potahto

Let’s Call the Whole Thing Off was a song written by George Gershwin and Ira Gershwin, which became famous for its playful lyrics that poked fun at the differences in the pronunciation of words, such as “you say potato and I say potahto.”

Spelling and pronunciation are included in the dictionary definition of a word, which is a good example of one of the many uses of metadata, namely as a label that provides a definition, description, and context for data.  Essentially, metadata describes data, and since data is attempting to describe a real world object, such as a potato, metadata is a further abstraction from reality.

And as we saw with the example of white horses in my blog post The Metadata Crisis, these abstract definitions can also include additional classifications (e.g., there are over 4,000 different varieties of potato), which also have to be well defined in order to facilitate clear communication and effective discussion.  These levels of abstractions, definitions, and classifications are essential to our attempts to understand, and do business with, the real world.  And this challenge continues even further with information.

 

You Say Potato and I Say Tater Tot

The difference, and relationship, between data and information is a common debate.  Not only do these two terms have varying definitions, but they are often used interchangeably.  Just a few examples include comparing and contrasting data quality with information quality, data management with information management, and data governance with information governance.

Some consider this an esoteric debate between data geeks and information nerds, but what is not debated is the importance of understanding how organizations use data and/or information to support their business activities.

Extending my analogy, data is like a potato and information is like a tater tot.  In other words, information is one of the many possible specific uses for data.  Information is one of the many possible specific things that we can make using data, which is why information quality professionals often speak about the information product.

So it’s important to remember that we can’t have a tater tot (information) without a potato (data), and that we can’t have either a tater tot or a potato without having a working definition (metadata) of what a potato is.

 

Let’s Not Call the Whole Thing Data

David Corrigan recently blogged about the importance of the metadata that tracks the lineage of information presented to an end user, and how the root causes of data quality and data governance issues are impossible to discover without this metadata.

Therefore, the lines of demarcation separating metadata, data, and information are not just an esoteric technical debate.  These demarcations are foundational to the efficiency and effectiveness of business operations.  So, let’s not call the whole thing data.

Let’s acknowledge the separate, but deeply interrelated, continuum formed by the disciplines of metadata, data, and information.

 

Related Posts

The Metadata Continuum

The Metadata Crisis

What’s the Meta with your Data?

Let’s Meta a Data

Listen to Peter Benson discuss Metadata, Data, and Information on the Knights of the Data Roundtable

Listen to Daragh O Brien discuss Data and Information Quality on OCDQ Radio

Listen to Gordon Hamilton discuss the Information Product on OCDQ Radio

Plato’s Data

Data, Information, and Knowledge Management

The Data-Information Continuum

The First Law of Data Quality

OCDQ Radio - The Fall Back Recap Show