i blog of Data glad and big

I recently blogged about the need to balance the hype of big data with some anti-hype.  My hope was, like a collision of matter and anti-matter, the hype and anti-hype would cancel each other out, transitioning our energy into a more productive discussion about big data.  But, of course, few things in human discourse ever reach such an equilibrium, or can maintain it for very long.

For example, Quentin Hardy recently blogged about six big data myths based on a conference presentation by Kate Crawford, who herself also recently blogged about the hidden biases in big data.  “I call B.S. on all of it,” Derrick Harris blogged in his response to the backlash against big data.  “It might be provocative to call into question one of the hottest tech movements in generations, but it’s not really fair.  That’s because how companies and people benefit from big data, data science or whatever else they choose to call the movement toward a data-centric world is directly related to what they expect going in.  Arguing that big data isn’t all it’s cracked up to be is a strawman, pure and simple — because no one should think it’s magic to begin with.”

In their new book Big Data: A Revolution That Will Transform How We Live, Work, and Think, Viktor Mayer-Schonberger and Kenneth Cukier explained that “like so many new technologies, big data will surely become a victim of Silicon Valley’s notorious hype cycle: after being feted on the cover of magazines and at industry conferences, the trend will be dismissed and many of the data-smitten startups will flounder.  But both the infatuation and the damnation profoundly misunderstand the importance of what is taking place.  Just as the telescope enabled us to comprehend the universe and the microscope allowed us to understand germs, the new techniques for collecting and analyzing huge bodies of data will help us make sense of our world in ways we are just starting to appreciate.  The real revolution is not in the machines that calculate data, but in data itself and how we use it.”

Although there have been numerous critical technology factors making the era of big data possible, such as increases in the amount of computing power, decreases in the cost of data storage, increased network bandwidth, parallel processing frameworks (e.g., Hadoop), scalable and distributed models (e.g., cloud computing), and other techniques (e.g., in-memory computing), Mayer-Schonberger and Cukier argued that “something more important changed too, something subtle.  There was a shift in mindset about how data could be used.  Data was no longer regarded as static and stale, whose usefulness was finished once the purpose for which it was collected was achieved.  Rather, data became a raw material of business, a vital economic input, used to create a new form of economic value.”

“In fact, with the right mindset, data can be cleverly used to become a fountain of innovation and new services.  The data can reveal secrets to those with the humility, the willingness, and the tools to listen.”

Pondering this big data war of words reminded me of the E. E. Cummings poem i sing of Olaf glad and big, which sings of Olaf, a conscientious objector forced into military service, who passively endures brutal torture inflicted upon him by training officers, while calmly responding (pardon the profanity): “I will not kiss your fucking flag” and “there is some shit I will not eat.”

Without question, big data has both positive and negative aspects, but the seeming unwillingness of either side in the big data war of words to “kiss each other’s flag,” so to speak, is not as concerning to me as is the conscientious objection to big data and data science expanding into realms where people and businesses were not used to enduring its influence.  For example, some will feel that data-driven audits of their decision-making is like brutal torture inflicted upon their less-than data-driven intuition.

E.E. Cummings sang the praises of Olaf “because unless statistics lie, he was more brave than me.”  i blog of Data glad and big, but I fear that, regardless of how big it is, “there is some data I will not believe” will be a common refrain by people who will lack the humility and willingness to listen to data, and who will not be brave enough to admit that statistics don’t always lie.

 

Related Posts

The Need for Data Philosophers

On Philosophy, Science, and Data

OCDQ Radio - Demystifying Data Science

OCDQ Radio - Data Quality and Big Data

Big Data and the Infinite Inbox

The Laugh-In Effect of Big Data

HoardaBytes and the Big Data Lebowski

Magic Elephants, Data Psychics, and Invisible Gorillas

Will Big Data be Blinded by Data Science?

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

Our Increasingly Data-Constructed World

The Wisdom of Crowds, Friends, and Experts

Data Separates Science from Superstition

Headaches, Data Analysis, and Negativity Bias

Why Data Science Storytelling Needs a Good Editor

Predictive Analytics, the Data Effect, and Jed Clampett

Rage against the Machines Learning

The Flying Monkeys of Big Data

Cargo Cult Data Science

Speed Up Your Data to Slow Down Your Decisions

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Big Data and the Infinite Inbox

Occasionally it’s necessary to temper the unchecked enthusiasm accompanying the peak of inflated expectations associated with any hype cycle.  This may be especially true for big data, and especially now since, as Svetlana Sicular of Gartner recently blogged, big data is falling into the trough of disillusionment and “to minimize the depth of the fall, companies must be at a high enough level of analytical and enterprise information management maturity combined with organizational support of innovation.”

I fear the fall may feel bottomless for those who fell hard for the hype and believe the Big Data Psychic capable of making better, if not clairvoyant, predictions.  When, in fact, “our predictions may be more prone to failure in the era of big data,” explained Nate Silver in his book The Signal and the Noise: Why Most Predictions Fail but Some Don't.  “There isn’t any more truth in the world than there was before the Internet.  Most of the data is just noise, as most of the universe is filled with empty space.”

Proposing the 3Ss (Small, Slow, Sure) as a counterpoint to the 3Vs (Volume, Velocity, Variety), Stephen Few recently blogged about the slow data movement.  “Data is growing in volume, as it always has, but only a small amount of it is useful.  Data is being generated and transmitted at an increasing velocity, but the race is not necessarily for the swift; slow and steady will win the information race.  Data is branching out in ever-greater variety, but only a few of these new choices are sure.”

Big data requires us to revisit information overload, a term that was originally about, not the increasing amount of information, but instead the increasing access to information.  As Clay Shirky stated, “It’s not information overload, it’s filter failure.”

As Silver noted, the Internet (like the printing press before it) was a watershed moment in our increased access to information, but its data deluge didn’t increase the amount of truth in the world.  And in today’s world, where many of us strive on a daily basis to prevent email filter failure and achieve what Merlin Mann called Inbox Zero, I find unfiltered enthusiasm about big data to be rather ironic, since big data is essentially enabling the data-driven decision making equivalent of the Infinite Inbox.

Imagine logging into your email every morning and discovering: You currently have () Unread Messages.

However, I’m sure most of it probably would be spam, which you obviously wouldn’t have any trouble quickly filtering (after all, infinity minus spam must be a back of the napkin calculation), allowing you to only read the truly useful messages.  Right?

 

Related Posts

HoardaBytes and the Big Data Lebowski

OCDQ Radio - Data Quality and Big Data

Open MIKE Podcast — Episode 05: Defining Big Data

Will Big Data be Blinded by Data Science?

Data Silence

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

A Statistically Significant Resolution for 2013

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

The Big Data Theory

Swimming in Big Data

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

MDM, Assets, Locations, and the TARDIS

Henrik Liliendahl Sørensen, as usual, is facilitating excellent discussion around master data management (MDM) concepts via his blog.  Two of his recent posts, Multi-Entity MDM vs. Multi-Domain MDM and The Real Estate Domain, have both received great commentary.  So, in case you missed them, be sure to read those posts, and join in their comment discussions/debates.

A few of the concepts discussed and debated reminded me of the OCDQ Radio episode Demystifying Master Data Management, during which guest John Owens explained the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), as well as, and perhaps the most important concept of all, the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Henrik’s second post touched on Location and Asset, which come up far less often in MDM discussions than Party and Product do, and arguably with understandably good reason.  This reminded me of the science fiction metaphor I used during my podcast with John, a metaphor I made in an attempt to help explain the difference and relationship between an Asset and a Location.

Location is often over-identified with postal address, which is actually just one means of referring to a location.  A location can also be referred to by its geographic coordinates, either absolute (e.g., latitude and longitude) or relative (e.g., 7 miles northeast of the intersection of Route 66 and Route 54).

Asset refers to a resource owned or controlled by an enterprise and capable of producing business value.  Assets are often over-identified with their location, especially real estate assets such as a manufacturing plant or an office building, since they are essentially immovable assets always at a particular location.

However, many assets are movable, such as the equipment used to manufacture products, or the technology used to support employee activities.  These assets are not always at a particular location (e.g., laptops and smartphones used by employees) and can also be dependent on other, non-co-located, sub-assets (e.g., replacement parts needed to repair broken equipment).

In Doctor Who, a brilliant British science fiction television program celebrating its 50th anniversary this year, the TARDIS, which stands for Time and Relative Dimension in Space, is the time machine and spaceship the Doctor and his companions travel in.

The TARDIS is arguably the Doctor’s most important asset, but its location changes frequently, both during and across episodes.

So, in MDM, we could say that Location is a time and relative dimension in space where we would currently find an Asset.

 

Related Posts

OCDQ Radio - Demystifying Master Data Management

OCDQ Radio - Master Data Management in Practice

OCDQ Radio - The Art of Data Matching

Plato’s Data

Once Upon a Time in the Data

The Data Cold War

DQ-BE: Single Version of the Time

The Data Outhouse

Fantasy League Data Quality

OCDQ Radio - The Blue Box of Information Quality

Choosing Your First Master Data Domain

Lycanthropy, Silver Bullets, and Master Data Management

Voyage of the Golden Records

The Quest for the Golden Copy

How Social can MDM get?

Will Social MDM be the New Spam?

More Thoughts about Social MDM

Is Social MDM going the Wrong Way?

The Semantic Future of MDM

Small Data and VRM

A Tale of Two Datas

Is big data more than just lots and lots of data?  Is big data unstructured and not-so-big data structured?  Malcolm Chisholm explored these questions in his recent Information Management column, where he posited that there are, in fact, two datas.

“One type of data,” Chisholm explained,  “represents non-material entities in vast computerized ecosystems that humans create and manage.  The other data consists of observations of events, which may concern material or non-material entities.”

Providing an example of the first type, Chisholm explained, “my bank account is not a physical thing at all; it is essentially an agreed upon idea between myself, the bank, the legal system, and the regulatory authorities.  It only exists insofar as it is represented, and it is represented in data.  The balance in my bank account is not some estimate with a positive and negative tolerance; it is exact.  The non-material entities of the financial sector are orderly human constructs.  Because they are orderly, we can more easily manage them in computerized environments.”

The orderly human constructs that are represented in data, in the stories told by data (including the stories data tell about us and the stories we tell data) is one of my favorite topics.  In our increasingly data-constructed world, it’s important to occasionally remind ourselves that data and the real world are not the same thing, especially when data represents non-material entities since, with the possible exception of Makers using 3-D printers, data-represented entities do not re-materialize into the real world.

Describing the second type, Chisholm explained, “a measurement is usually a comparison of a characteristic using some criteria, a count of certain instances, or the comparison of two characteristics.  A measurement can generally be quantified, although sometimes it’s expressed in a qualitative manner.  I think that big data goes beyond mere measurement, to observations.”

Chisholm called the first type the Data of Representation, and the second type the Data of Observation.

The data of representation tends to be structured, in the relational sense, but doesn’t need to be (e.g., graph databases) and the data of observation tends to be unstructured, but it can also be structured (e.g., the structured observations generated by either a data profiling tool analyzing structured relational tables or flat files, or a word-counting algorithm analyzing unstructured text).

Structured and unstructured,” Chisholm concluded, “describe form, not essence, and I suggest that representation and observation describe the essences of the two datas.  I would also submit that both datas need different data management approaches.  We have a good idea what these are for the data of representation, but much less so for the data of observation.”

I agree that there are two types of data (i.e., representation and observation, not big and not-so-big) and that different data uses will require different data management approaches.  Although data modeling is still important and data quality still matters, how much data modeling and data quality is needed before data can be effectively used for specific business purposes will vary.

In order to move our discussions forward regarding “big data” and its data management and business intelligence challenges, we have to stop fiercely defending our traditional perspectives about structure and quality in order to effectively manage both the form and essence of the two datas.  We also have to stop fiercely defending our traditional perspectives about data analytics, since there will be some data use cases where depth and detailed analysis may not be necessary to provide business insight.

 

A Tale of Two Datas

In conclusion, and with apologies to Charles Dickens and his A Tale of Two Cities, I offer the following A Tale of Two Datas:

It was the best of times, it was the worst of times.
It was the age of Structured Data, it was the age of Unstructured Data.
It was the epoch of SQL, it was the epoch of NoSQL.
It was the season of Representation, it was the season of Observation.
It was the spring of Big Data Myth, it was the winter of Big Data Reality.
We had everything before us, we had nothing before us,
We were all going direct to hoarding data, we were all going direct the other way.
In short, the period was so far like the present period, that some of its noisiest authorities insisted on its being signaled, for Big Data or for not-so-big data, in the superlative degree of comparison only.

Related Posts

HoardaBytes and the Big Data Lebowski

The Idea of Order in Data

The Most August Imagination

Song of My Data

The Lies We Tell Data

Our Increasingly Data-Constructed World

Plato’s Data

OCDQ Radio - Demystifying Master Data Management

OCDQ Radio - Data Quality and Big Data

Big Data: Structure and Quality

Swimming in Big Data

Sometimes it’s Okay to be Shallow

Darth Vader, Big Data, and Predictive Analytics

The Big Data Theory

Finding a Needle in a Needle Stack

Exercise Better Data Management

Magic Elephants, Data Psychics, and Invisible Gorillas

Why Can’t We Predict the Weather?

Data and its Relationships with Quality

A Tale of Two Q’s

A Tale of Two G’s

Commendable Comments (Part 13)

Welcome to the 400th Obsessive-Compulsive Data Quality (OCDQ) blog post!  I am commemorating this milestone with the 13th entry in my ongoing series for expressing gratitude to my readers for their truly commendable comments on my blog posts.

 

Commendable Comments

On Will Big Data be Blinded by Data Science?, Meta Brown commented:

“Your concern is well-founded. Knowing how few businesses make really good use of the small data they’ve had around all along, it’s easy to imagine that they won’t do any better with bigger data sets.

I wrote some hints for those wallowing into the big data mire in my post, Better than Brute Force: Big Data Analytics Tips. But the truth is that many organizations won’t take advantage of the ideas that you are presenting, or my tips, especially as the datasets grow larger. That’s partly because they have no history in scientific methods, and partly because the data science movement is driving employers to search for individuals with heroically large skill sets.

Since few, if any, people truly meet these expectations, those hired will have real human limitations, and most often they will be people who know much more about data storage and manipulation than data analysis and applications.”

On Will Big Data be Blinded by Data Science?, Mike Urbonas commented:

“The comparison between scientific inquiry and business decision making is a very interesting and important one. Successfully serving a customer and boosting competitiveness and revenue does require some (hopefully unique) insights into customer needs. Where do those insights come from?

Additionally, scientists also never stop questioning and improving upon fundamental truths, which I also interpret as not accepting conventional wisdom — obviously an important trait of business managers.

I recently read commentary that gave high praise to the manager utilizing the scientific method in his or her decision-making process. The author was not a technologist, but rather none other than Peter Drucker, in writings from decades ago.

I blogged about Drucker’s commentary, data science, the scientific method vs. business decision making, and I’d value your and others’ input: Business Managers Can Learn a Lot from Data Scientists.”

On Word of Mouth has become Word of Data, Vish Agashe commented:

“I would argue that listening to not only customers but also business partners is very important (and not only in retail but in any business). I always say that, even if as an organization you are not active in the social world, assume that your customers, suppliers, employees, competitors are active in the social world and they will talk about you (as a company), your people, products, etc.

So it is extremely important to tune in to those conversations and evaluate its impact on your business. A dear friend of mine ventured into the restaurant business a few years back. He experienced a little bit of a slowdown in his business after a great start. He started surveying his customers, brought in food critiques to evaluate if the food was a problem, but he could not figure out what was going on. I accidentally stumbled upon Yelp.com and noticed that his restaurant’s rating had dropped and there were some complaints recently about services and cleanliness (nothing major though).

This happened because he had turnover in his front desk staff. He was able to address those issues and was able to reach out to customers who had bad experience (some of them were frequent visitors). They were able to go back and comment and give newer ratings to his business. This helped him with turning the corner and helped with the situation.

This was a big learning moment for me about the power of social media and the need for monitoring it.”

On Data Quality and the Bystander Effect, Jill Wanless commented:

“Our organization is starting to develop data governance processes and one of the processes we have deliberately designed is to get to the root cause of data quality issues.

We’ve designed it so that the errors that are reported also include the userid and the system where the data was generated. Errors are then filtered by function and the business steward responsible for that function is the one who is responsible for determining and addressing the root cause (which of course may require escalation to solve).

The business steward for the functional area has the most at stake in the data and is typically the most knowledgeable as to the process or system that may be triggering the error. We have yet to test this as we are currently in the process of deploying a pilot stewardship program.

However, we are very confident that it will help us uncover many of the causes of the data quality problems and with lots of PLAN, DO, CHECK, and ACT, our goal is to continuously improve so that our need for stewardship eventually (many years away no doubt) is reduced.”

On The Return of the Dumb Terminal, Prashanta Chandramohan commented:

“I can’t even imagine what it’s like to use this iPad I own now if I am out of network for an hour. Supposedly the coolest thing to own and a breakthrough innovation of this decade as some put it, it’s nothing but a dumb terminal if I do not have 3G or Wi-Fi connectivity.

Putting most of my documents, notes, to-do’s, and bookmarked blogs for reading later (e.g., Instapaper) in the cloud, I am sure to avoid duplicating data and eliminate installing redundant applications.

(Oops! I mean the apps! :) )

With cloud-based MDM and Data Quality tools starting to linger, I can’t wait to explore and utilize the advantages these return of dumb terminals bring to our enterprise information management field.”

On Big Data Lessons from Orbitz, Dylan Jones commented:

“The fact is that companies have always done predictive marketing, they’re just getting smarter at it.

I remember living as a student in a fairly downtrodden area that because of post code analytics meant I was bombarded with letterbox mail advertising crisis loans to consolidate debts and so on. When I got my first job and moved to a new area all of a sudden I was getting loans to buy a bigger car. The companies were clearly analyzing my wealth based on post code lifestyle data.

Fast forward and companies can do way more as you say.

Teresa Cottam (Global Telecoms Analyst) has cited the big telcos as a major driver in all this, they now consider themselves data companies so will start to offer more services to vendors to track our engagement across the entire communications infrastructure (Read more here: http://bit.ly/xKkuX6).

I’ve just picked up a shiny new Mac this weekend after retiring my long suffering relationship with Windows so it will be interesting to see what ads I get served!”

And please check out all of the commendable comments received on the blog post: Data Quality and Chicken Little Syndrome.

 

Thank You for Your Comments and Your Readership

You are Awesome — which is why receiving your comments has been the most rewarding aspect of my blogging experience over the last 400 posts.  Even if you have never posted a comment, you are still awesome — feel free to tell everyone I said so.

This entry in the series highlighted commendable comments on blog posts published between April 2012 and June 2012.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please continue commenting and stay tuned for future entries in the series.

Thank you for reading the Obsessive-Compulsive Data Quality blog.  Your readership is deeply appreciated.

 

Related Posts

Commendable Comments (Part 12) – The Third Blogiversary of OCDQ Blog

Commendable Comments (Part 11)

Commendable Comments (Part 10) – The 300th OCDQ Blog Post

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

OCDQ Blog Bicentennial – The 200th OCDQ Blog Post

Commendable Comments (Part 9)

Commendable Comments (Part 8)

Commendable Comments (Part 7)

Commendable Comments (Part 6)

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

Commendable Comments (Part 4)

Commendable Comments (Part 3)

Commendable Comments (Part 2)

Commendable Comments (Part 1)

Data Quality and the Bystander Effect

In his recent Harvard Business Review blog post Break the Bad Data Habit, Tom Redman cautioned against correcting data quality issues without providing feedback to where the data originated.  “At a minimum,” Redman explained, “others using the erred data may not spot the error.  There is no telling where it might turn up or who might be victimized.”  And correcting bad data without providing feedback to its source also denies the organization an opportunity to get to the bottom of the problem.

“And failure to provide feedback,” Redman continued, “is but the proximate cause.  The deeper root issue is misplaced accountability — or failure to recognize that accountability for data is needed at all.  People and departments must continue to seek out and correct errors.  They must also provide feedback and communicate requirements to their data sources.”

In his blog post The Secret to an Effective Data Quality Feedback Loop, Dylan Jones responded to Redman’s blog post with some excellent insights regarding data quality feedback loops and how they can help improve your data quality initiatives.

I definitely agree with Redman and Jones about the need for feedback loops, but I have found, more often than not, that no feedback at all is provided on data quality issues because of the assumption that data quality is someone else’s responsibility.

This general lack of accountability for data quality issues is similar to what is known in psychology as the Bystander Effect, which refers to people often not offering assistance to the victim in an emergency situation when other people are present.  Apparently, the mere presence of other bystanders greatly decreases intervention, and the greater the number of bystanders, the less likely it is that any one of them will help.  Psychologists believe that the reason this happens is that as the number of bystanders increases, any given bystander is less likely to interpret the incident as a problem, and less likely to assume responsibility for taking action.

In my experience, the most common reason that data quality issues are often neither reported nor corrected is that most people throughout the enterprise act like data quality bystanders, making them less likely to interpret bad data as a problem or, at the very least, not their responsibility.  But the enterprise’s data quality is perhaps most negatively affected by this bystander effect, which may make it the worst bad data habit that the enterprise needs to break.

 

Related Posts

DQ-Tip: “Don't pass bad data on to the next person...”

Hyperactive Data Quality (Second Edition)

A Farscape Analogy for Data Quality

There is No Such Thing as a Root Cause

Data Quality and the Q Test

The Data Quality Wager

The Third Law of Data Quality

The Data Governance Oratorio

Shared Responsibility

The Algebra of Collaboration

Collaboration isn’t Brain Surgery

The Three Most Important Letters in Data Governance

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Redefining Data Quality — Guest Peter Perera discusses his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Solvency II and Data Quality

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, Ken O’Connor and I discuss the Solvency II standards for data quality, and how its European insurance regulatory requirement of “complete, appropriate, and accurate” data represents common sense standards for all businesses.

Ken O’Connor is an independent data consultant with over 30 years of hands-on experience in the field, specializing in helping organizations meet the data quality management challenges presented by data-intensive programs such as data conversions, data migrations, data population, and regulatory compliance such as Solvency II, Basel II / III, Anti-Money Laundering, the Foreign Account Tax Compliance Act (FATCA), and the Dodd–Frank Wall Street Reform and Consumer Protection Act.

Ken O’Connor also provides practical data quality and data governance advice on his popular blog at: kenoconnordata.com

 

Solvency II and Data Quality

Additional listening options:

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.
  • The Fall Back Recap Show — A look back at the Best of OCDQ Radio, including discussions about Data, Information, Business-IT Collaboration, Change Management, Big Analytics, Data Governance, and the Data Revolution.

The Data Governance Imperative

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, Steve Sarsfield and I discuss how data governance is about changing the hearts and minds of your company to see the value of data quality, the characteristics of a data champion, and creating effective data quality scorecards.

Steve Sarsfield is a leading author and expert in data quality and data governance.  His book The Data Governance Imperative is a comprehensive exploration of data governance focusing on the business perspectives that are important to data champions, front-office employees, and executives.  He runs the Data Governance and Data Quality Insider, which is an award-winning and world-recognized blog.  Steve Sarsfield is the Product Marketing Manager for Data Governance and Data Quality at Talend.

 

The Data Governance Imperative

Additional listening options:

 

Win a copy of the Book

Steve Sarsfield wants to give one OCDQ Radio listener a free copy of The Data Governance Imperative

 

Here is how the book contest will work:

 

(1) Book Contest Question — Name at least one of the characteristics of a data champion that Steve Sarsfield described during this OCDQ Radio episode.

 

(2) Book Contest Deadline — By or before April 30, 2012, Email Jim Harris with your answer to the book contest question.

 

(3) Book Contest Winner — In May 2012, one winner will be randomly selected from the emails containing the correct answer to the contest question, and Steve Sarsfield (or his publisher) will email the winner requesting a shipping address for the book.

 

Related Posts

Data Governance and Data Quality

MacGyver: Data Governance and Duct Tape

Data Governance Frameworks are like Jigsaw Puzzles

The Three Most Important Letters in Data Governance

Data Governance and the Adjacent Possible

Data Governance Star Wars: Balancing Bureaucracy and Agility

Beware the Data Governance Ides of March

Aristotle, Data Governance, and Lead Rulers

Data Governance and the Buttered Cat Paradox

The Data Governance Oratorio

Video: Declaration of Data Governance

The Collaborative Culture of Data Governance

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

Commendable Comments (Part 12)

Since I officially launched this blog on March 13, 2009, that makes today the Third Blogiversary of OCDQ Blog!

So, absolutely without question, there is no better way to commemorate this milestone other than to also make this the 12th entry in my ongoing series for expressing my gratitude to my readers for their truly commendable comments on my blog posts.

 

Commendable Comments

On Big Data el Memorioso, Mark Troester commented:

“I think this helps illustrate that one size does not fit all.

You can’t take a singular approach to how you design for big data.  It’s all about identifying relevance and understanding that relevance can change over time.

There are certain situations where it makes sense to leverage all of the data, and now with high performance computing capabilities that include in-memory, in-DB and grid, it's possible to build and deploy rich models using all data in a short amount of time. Not only can you leverage rich models, but you can deploy a large number of models that leverage many variables so that you get optimal results.

On the other hand, there are situations where you need to filter out the extraneous information and the more intelligent you can be about identifying the relevant information the better.

The traditional approach is to grab the data, cleanse it, and land it somewhere before processing or analyzing the data.  We suggest that you leverage analytics up front to determine what data is relevant as it streams in, with relevance based on your organizational knowledge or context.  That helps you determine what data should be acted upon immediately, where it should be stored, etc.

And, of course, there are considerations about using visual analytic techniques to help you determine relevance and guide your analysis, but that’s an entire subject just on its own!”

On Data Governance Frameworks are like Jigsaw Puzzles, Gabriel Marcan commented:

“I agree (and like) the jigsaw puzzles metaphor.  I would like to make an observation though:

Can you really construct Data Governance one piece at a time?

I would argue you need to put together sets of pieces simultaneously, and to ensure early value, you might want to piece together the interesting / easy pieces first.

Hold on, that sounds like the typical jigsaw strategy anyway . . . :-)”

On Data Governance Frameworks are like Jigsaw Puzzles, Doug Newdick commented:

“I think that there are a number of more general lessons here.

In particular, the description of the issues with data governance sounds very like the issues with enterprise architecture.  In general, there are very few eureka moments in solving the business and IT issues plaguing enterprises.  These solutions are usually 10% inspiration, 90% perspiration in my experience.  What looks like genius or a sudden breakthrough is usually the result of a lot of hard work.

I also think that there is a wider Myth of the Framework at play too.

The myth is that if we just select the right framework then everything else will fall into place.  In reality, the selection of the framework is just the start of the real work that produces the results.  Frameworks don’t solve your problems, people solve your problems by the application of brain-power and sweat.

All frameworks do is take care of some of the heavy-lifting, i.e., the mundane foundational research and thinking activity that is not specific to your situation.

Unfortunately the myth of the framework is why many organizations think that choosing TOGAF will immediately solve their IT issues and are then disappointed when this doesn’t happen, when a more sensible approach might have garnered better long-term success.”

On Data Quality: Quo Vadimus?, Richard Jarvis commented:

“I agree with everything you’ve said, but there’s a much uglier truth about data quality that should also be discussed — the business benefit of NOT having a data quality program.

The unfortunate reality is that in a tight market, the last thing many decision makers want to be made public (internally or externally) is the truth.

In a company with data quality principles ingrained in day-to-day processes, and reporting handled independently, it becomes much harder to hide or reinterpret your falling market share.  Without these principles though, you’ll probably be able to pick your version of the truth from a stack of half a dozen, then spend your strategy meeting discussing which one is right instead of what you’re going to do about it.

What we’re talking about here is the difference between a Politician — who will smile at the camera and proudly announce 0.1% growth was a fantastic result given X, Y, and Z factors — and a Statistician who will endeavor to describe reality with minimal personal bias.

And the larger the organization, the more internal politics plays a part.  I believe a lot of the reluctance in investing in data quality initiatives could be traced back to this fear of being held truly accountable, regardless of it being in the best interests of the organization.  To build a data quality-centric culture, the change must be driven from the CEO down if it’s to succeed.”

On Data Quality: Quo Vadimus?, Peter Perera commented:

“The question: ‘Is Data Quality a Journey or a Destination?’ suggests that it is one or the other.

I agree with another comment that data quality is neither . . . or, I suppose, it could be both (the journey is the destination and the destination is the journey. They are one and the same.)

The quality of data (or anything for that matter) is something we experience.

Quality only radiates when someone is in the act of experiencing the data, and usually only when it is someone that matters.  This radiation decays over time, ranging from seconds or less to years or more.

The only problem with viewing data quality as radiation is that radiation can be measured by an instrument, but there is no such instrument to measure data quality.

We tend to confuse data qualities (which can be measured) and data quality (which cannot).

In the words of someone whose name I cannot recall: Quality is not job one. Being totally %@^#&$*% amazing is job one.The only thing I disagree with here is that being amazing is characterized as a job.

Data quality is not something we do to data.  It’s not a business initiative or project or job.  It’s not a discipline.  We need to distinguish between the pursuit (journey) of being amazing and actually being amazing (destination — but certainly not a final one).  To be amazing requires someone to be amazed.  We want data to be continuously amazing . . . to someone that matters, i.e., someone who uses and values the data a whole lot for an end that makes a material difference.

Come to think of it, the only prerequisite for data quality is being alive because that is the only way to experience it.  If you come across some data and have an amazed reaction to it and can make a difference using it, you cannot help but experience great data quality.  So if you are amazing people all the time with your data, then you are doing your data quality job very well.”

On Data Quality and Miracle Exceptions, Gordon Hamilton commented:

“Nicely delineated argument, Jim.  Successfully starting a data quality program seems to be a balance between getting started somewhere and determining where best to start.  The data quality problem is like a two-edged sword without a handle that is inflicting the death of a thousand cuts.

Data quality is indeed difficult to get a handle on.”

And since they generated so much great banter, please check out all of the commendable comments received by the blog posts There is No Such Thing as a Root Cause and You only get a Return from something you actually Invest in.

 

Thank You for Three Awesome Years

You are Awesome — which is why receiving your comments has been the most rewarding aspect of my blogging experience over the last three years.  Even if you have never posted a comment, you are still awesome — feel free to tell everyone I said so.

This entry in the series highlighted commendable comments on blog posts published between December 2011 and March 2012.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please continue commenting and stay tuned for future entries in the series.

Thank you for reading the Obsessive-Compulsive Data Quality blog for the last three years. Your readership is deeply appreciated.

 

Related Posts

Commendable Comments (Part 11)

Commendable Comments (Part 10) – The 300th OCDQ Blog Post

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

OCDQ Blog Bicentennial – The 200th OCDQ Blog Post

Commendable Comments (Part 9)

Commendable Comments (Part 8)

Commendable Comments (Part 7)

Commendable Comments (Part 6)

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

Commendable Comments (Part 4)

Commendable Comments (Part 3)

Commendable Comments (Part 2)

Commendable Comments (Part 1)

Data Quality: Quo Vadimus?

Over the past week, an excellent meme has been making its way around the data quality blogosphere.  It all started, as many of the best data quality blogging memes do, with a post written by Henrik Liliendahl Sørensen.

In Turning a Blind Eye to Data Quality, Henrik blogged about how, as data quality practitioners, we are often amazed by the inconvenient truth that our organizations are capable of growing as a successful business even despite the fact that they often turn a blind eye to data quality by ignoring data quality issues and not following the data quality best practices that we advocate.

“The evidence about how poor data quality is costing enterprises huge sums of money has been out there for a long time,” Henrik explained.  “But business successes are made over and over again despite bad data.  There may be casualties, but the business goals are met anyway.  So, poor data quality is just something that makes the fight harder, not impossible.”

As data quality practitioners, we often don’t effectively sell the business benefits of data quality, but instead we often only talk about the negative aspects of not investing in data quality, which, as Henrik explained, is usually why business leaders turn a blind eye to data quality challenges.  Henrik concluded with the recommendation that when we are talking with business leaders, we need to focus on “smaller, but tangible, wins where data quality improvement and business efficiency goes hand in hand.”

 

Is Data Quality a Journey or a Destination?

Henrik’s blog post received excellent comments, which included a debate about whether data quality is a journey or a destination.

Garry Ure responded with his blog post Destination Unknown, in which he explained how “historically the quest for data quality was likened to a journey to convey the concept that you need to continue to work in order to maintain quality.”  But Garry also noted that sometimes when an organization does successfully ingrain data quality practices into day-to-day business operations, it can make it seem like data quality is a destination that the organization has finally reached.

Garry concluded data quality is “just one destination of many on a long and somewhat recursive journey.  I think the point is that there is no final destination, instead the journey becomes smoother, quicker, and more pleasant for those traveling.”

Bryan Larkin responded to Garry with the blog post Data Quality: Destinations Known, in which Bryan explained, “data quality should be a series of destinations where short journeys occur on the way to those destinations.  The reason is simple.  If we make it about one big destination or one big journey, we are not aligning our efforts with business goals.”

In order to do this, Bryan recommends that “we must identify specific projects that have tangible business benefits (directly to the bottom line — at least to begin with) that are quickly realized.  This means we are looking at less of a smooth journey and more of a sprint to a destination — to tackle a specific problem and show results in a short amount of time.  Most likely we’ll have a series of these sprints to destinations with little time to enjoy the journey.”

“While comprehensive data quality initiatives,” Bryan concluded, “are things we as practitioners want to see — in fact we build our world view around such — most enterprises (not all, mind you) are less interested in big initiatives and more interested in finite, specific, short projects that show results.  If we can get a series of these lined up, we can think of them more in terms of an overall comprehensive plan if we like — even a journey.  But most functional business staff will think of them in terms of the specific projects that affect them.”

The Latin phrase Quo Vadimus? translates into English as “Where are we going?”  When I ponder where data quality is going, and whether data quality is a journey or a destination, I am reminded of the words of T.S. Eliot:

“We must not cease from exploration and the end of all our exploring will be to arrive where we began and to know the place for the first time.”

We must not cease from exploring new ways to continuously improve our data quality and continuously put into practice our data governance principles, policies, and procedures, and the end of all our exploring will be to arrive where we began and to know, perhaps for the first time, the value of high-quality data to our enterprise’s continuing journey toward business success.

 

Related Posts

Selling the Business Benefits of Data Quality

DQ-View: The Cassandra Effect

The Data Quality Wager

Data Quality is not an Act, it is a Habit

Data Quality Practices—Activate!

Hyperactive Data Quality (Second Edition)

A Tale of Two Q’s

What going to the dentist taught me about data quality

Groundhog Data Quality Day

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

Finding Data Quality

Dot Collectors and Dot Connectors

The attention blindness inherent in the digital age often leads to a debate about multitasking, which many claim impairs our ability to solve complex problems.  Therefore, we often hear that we need to adopt monotasking, i.e., we need to eliminate all possible distractions and focus our attention on only one task at a time.

However, during the recent Harvard Business Review podcast The Myth of Monotasking, Cathy Davidson, author of the new book Now You See It: How the Brain Science of Attention Will Transform the Way We Live, Work, and Learn, explained how “the moment that you start not paying attention fully to the task at hand, you actually start seeing other things that your attention would have missed.”  Although Davidson acknowledges that attention blindness is a serious problem, she explained that there really is no such thing as monotasking.  Modern neuroscience research has revealed that the human brain is, in fact, always multitasking.  Furthermore, she explained how multitasking can be extremely useful for a new and expansive form of attention.

“We all see selectively, but we don’t select the same things to see,” Davidson explained.  “So if we can learn to work together, we can actually account for, and productively work around, our own individual attention blindness by seeing collaboratively in a way that compensates for that blindness.”

During the podcast, an analogy was made that focusing attention on specific tasks can result in a lot of time spent collecting dots without spending enough time connecting those dots.  This point caused me to ponder the division of organizational labor that has historically existed between the dot collection of data management, which focuses on aspects such as data integrity and data quality, and the dot connection of business intelligence, which focuses on aspects such as data analysis and data visualization.

I think most data management professionals are dot collectors since it often seems like they spend a lot of their time, money, and attention on collecting (and profiling, modeling, cleansing, transforming, matching, and otherwise managing) data dots.

But since data’s value comes from data’s usefulness, merely collecting data dots doesn’t mean anything if you cannot connect those dots into meaningful patterns that enable your organization to take action or otherwise support your business activities.

So I think most business intelligence professionals are dot connectors since it often seems like they spend a lot of their time, money, and attention on connecting (and querying, aggregating, reporting, visualizing, and otherwise analyzing) data dots.

However, the attention blindness of data management and business intelligence professionals means that they see selectively, often intentionally selecting to not see the same things.  But as more of our personal and professional lives become digitized and pixelated, the big picture of the business world is inundated with the multifaceted challenges of big data, where the fast-moving large volumes of varying data are transforming the way we have to view traditional data management and business intelligence.

We need to replace our perspective of data management and business intelligence as separate monotasking activities with an expansive form of organizational multitasking where the dot collectors and dot connectors work together more collaboratively.

 

Related Posts

Channeling My Inner Beagle: The Case for Hyperactivity

Mind the Gap

The Wisdom of the Social Media Crowd

No Datum is an Island of Serendip

DQ-View: Data Is as Data Does

The Real Data Value is Business Insight

Information Overload Revisited

Neither the I Nor the T is Magic

The Big Data Collider

OCDQ Radio - Big Data and Big Analytics

OCDQ Radio - So Long 2011, and Thanks for All the . . .

The Interconnected User Interface

Redefining Data Quality

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, I have an occasionally spirited discussion about data quality with Peter Perera, partially precipitated by his provocative post from this past summer, The End of Data Quality...as we know it, which included his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.

Peter Perera is a recognized consultant and thought leader with significant experience in Master Data Management, Customer Relationship Management, Data Quality, and Customer Data Integration.  For over 20 years, he has been advising and working with Global 5000 organizations and mid-size enterprises to increase the usability and value of their customer information.

 

Redefining Data Quality

Additional listening options:

 

Related Posts

You Say Potato and I Say Tater Tot

You only get a Return from something you actually Invest in

Listen to John Ladley discuss why Data and Information are Enterprise Assets on OCDQ Radio

Listen to Daragh O Brien discuss Data and Information Quality on OCDQ Radio

Listen to Gordon Hamilton discuss the Information Product on OCDQ Radio

Listen to Peter Benson discuss Metadata, Data, and Information on the Knights of the Data Roundtable

Plato’s Data

Data, Information, and Knowledge Management

The Data-Information Continuum

The First Law of Data Quality

Commendable Comments (Part 11)

This Thursday is Thanksgiving Day, which in the United States is a holiday with a long, varied, and debated history.  However, the most consistent themes remain family and friends gathering together to share a large meal and express their gratitude.

This is the eleventh entry in my ongoing series for expressing my gratitude to my readers for their commendable comments on my blog posts.  Receiving comments is the most rewarding aspect of my blogging experience because not only do comments greatly improve the quality of my blog, comments also help me better appreciate the difference between what I know and what I only think I know.  Which is why, although I am truly grateful to all of my readers, I am most grateful to my commenting readers.

 

Commendable Comments

On The Stakeholder’s DilemmaGwen Thomas commented:

“Recently got to listen in on a ‘cooperate or not’ discussion.  (Not my clients.) What struck me was that the people advocating cooperation were big-picture people (from architecture and process) while those who just wanted what they wanted were more concerned about their own short-term gains than about system health.  No surprise, right?

But what was interesting was that they were clearly looking after their own careers, and not their silos’ interests.  I think we who help focus and frame the Stakeholder’s Dilemma situations need to be better prepared to address the individual people involved, and not just the organizational roles they represent.”

On Data, Information, and Knowledge ManagementFrank Harland commented:

“As always, an intriguing post. Especially where you draw a parallel between Data Governance and Knowledge Management (wisdom management?)  We sometimes portray data management (current term) as ‘well managed data administration’ (term from 70s-80s).  As for the debate on ‘data’ and ‘information’ I prefer to see everything written, drawn and / or stored on paper or in digital format as data with various levels of informational value, depending on the amount and quality of metadata surrounding the data item and the accessibility, usefulness (quality) of that item.

For example, 12024561414 is a number with low informational value. I could add metadata, for instance: ‘Phone number’, that makes it potentially known as a phone number.  Rather than let you find out whose number it is we could add more information value and add more metadata like: ‘White House Switchboard’.  Accessibility could be enhanced by improving formatting like: (1) 202-456-1414.

What I am trying to say with this example is that data items should be placed on a rising scale of informational value rather than be put on steps or firm levels of informational value.  So the Information Hierarchy provided by Professor Larson does not work very well for me.  It could work only if for all data items the exact information value was determined for every probable context.  This model is useful for communication purposes.”

On Plato’s DataPeter Perera commented:

“‘erised stra ehru oyt ube cafru oyt on wohsi.’

To all Harry Potter fans this translates to: ‘I show not your face but your heart’s desire.’

It refers to The Mirror of Erised.  It does not reflect reality but what you desire. (Erised is Desired spelled backwards.)  Often data will cast a reflection of what people want to see.

‘Dumbledore cautions Harry that the mirror gives neither knowledge nor truth and that men have wasted away before it, entranced by what they see.’  How many systems are really Mirrors of Erised?”

On Plato’s DataLarisa Bedgood commented:

“Because the prisoners in the cave are chained and unable to turn their heads to see what goes on behind them, they perceive the shadows as reality.  They perceive imperfect reflections of truth and reality.

Bringing the allegory to modern times, this serves as a good reminder that companies MUST embrace data quality for an accurate and REAL view of customers, business initiatives, prospects, and so on.  Continuing to view half-truths based on possibly faulty data and information means you are just lost in a dark cave!

I also like the comparison to the Mirror of Erised.  One of my favorite movies is the Matrix, in which there are also a lot of parallelisms to Plato’s Cave Allegory.  As Morpheus says to Neo: ‘That you are a slave, Neo.  Like everyone else you were born into bondage.  Into a prison that you cannot taste or see or touch.  A prison for your mind.’  Once Neo escapes the Matrix, he discovers that his whole life was based on shadows of the truth.

Plato, Harry Potter, and Morpheus — I’d love to hear a discussion between the three of them in a cave!”

On Plato’s DataJohn Owens commented:

“It is true that data is only a reflection of reality but that is also true of anything that we perceive with our senses.  When the prisoners in the cave turn around, what they perceive with their eyes in the visible spectrum is only a very narrow slice of what is actually there.  Even the ‘solid’ objects they see, and can indeed touch, are actually composed of 99% empty space.

The questions that need to be asked and answered about the essence of data quality are far less esoteric than many would have us believe.  They can be very simple, without being simplistic.  Indeed simplicity can be seen as a cornerstone of true data quality.  If you cannot identify the underlying simplicity that lies at the heart of data quality you can never achieve it.  Simple questions are the most powerful.  Questions like, ‘In our world (i.e., the enterprise in question) what is it that we need to know about (for example) a Sale that will enable us to operate successfully and meet all of our goals and objectives?’  If the enterprise cannot answer such simple questions then it is in trouble.  Making the questions more complicated will not take the enterprise any closer to where it needs to be.  Rather it will completely obscure the goal.

Data quality is rather like a ‘magic trick’ done by a magician.  Until you know how it is done it appears to an unfathomable mystery.  Once you find out that is merely an illusion, the reality is absolutely simple and, in fact, rather mundane.  But perhaps that is why so many practitioners perpetuate the illusion.  It is not for self gain.  They just don’t want to tell the world that, when it comes to data quality, there is no Tooth Fairy, no Easter Bunny, or no Santa Claus.  It’s sad, but true.  Data quality is boringly simple!”

On Plato’s DataPeter Benson commented:

“Actually I would go substantially further, whereas data was originally no more than a representation of the real world and if validation was required the real world was the ‘authoritative source’ — but that is clearly no longer the case.  Data is in fact the new reality!

Data is now used to track everything, if the data is wrong the real world item disappears.  It may have really been destroyed or it may be simply lost, but it does not matter, if the data does not provide evidence of its existence then it does not exist.  If you doubt this, just think of money, how much you have is not based on any physical object but on data.

By the way the theoretical definition I use for data is as follows:

Datum — a disruption in a continuum.

The practical definition I use for data is as follows:

Data — elements into which information is transformed so that it can be stored or moved.”

On Data Governance and the Adjacent PossiblePaul Erb commented:

“We can see that there’s a trench between those who think adjacent means out of scope and those who think it means opportunity.  Great leaders know that good stories make for better governance for an organization that needs to adapt and evolve, but stay true to its mission. Built from, but not about, real facts, good fictions are broadly true without being specifically true, and therefore they carry well to adjacent business processes where their truths can be applied to making improvements.

On the other hand, if it weren’t for nonfiction — accounts of real markets and processes — there would be nothing for the POSSIBLE to be adjacent TO.  Managers often have trouble with this because they feel called to manage the facts, and call anything else an airy-fairy waste of time.

So a data governance program needs to assert whether its purpose is to fix the status quo only, or to fix the status quo in order to create agility to move into new areas when needed.  Each of these should have its own business case and related budgets and thresholds (tolerances) in the project plan.  And it needs to choose its sponsorship and data quality players accordingly.”

On You Say Potato and I Say Tater TotJohn O’Gorman commented:

“I’ve been working on a definitive solution for the data / information / metadata / attributes / properties knot for a while now and I think I have it figured out.

I read your blog entitled The Semantic Future of MDM and we share the same philosophy even while we differ a bit on the details.  Here goes.  It’s all information.  Good, bad, reliable or not, the argument whether data is information or vice versa is not helpful.  The reason data seems different than information is because it has too much ambiguity when it is out of context.  Data is like a quantum wave: it has many possibilities one of which is ‘collapsed’ into reality when you add context.  Metadata is not a type of data, any more than attributes, properties or associations are a type of information.  These are simply conventions to indicate the role that information is playing in a given circumstance.

Your Michelle Davis example is a good illustration: Without context, that string could be any number of individuals, so I consider it data.  Give it a unique identifier and classify it as a digital representation in the class of Person, however and we have information.  If I then have Michelle add attributes to her personal record — like sex, age, etc. — and assuming that these are likewise identified and classed — now Michelle is part of a set, or relation. Note that it is bad practice — and consequently the cause of many information management headaches — to use data instead of information.  Ambiguity kills.  Now, if I were to use Michelle’s name in a Subject Matter Expert field as proof of the validity of a digital asset; or in the Author field as an attribute, her information does not *become* metadata or an attribute: it is still information.  It is merely being used differently.

In other words, in my world while the terms ‘data’ and ‘information’ are classified as concepts, the terms ‘metadata’, ‘attribute’ and ‘property’ are classified as roles to which instances of those concepts (well, one of them anyway) can be put, i.e., they are fit for purpose.  This separation of the identity and class of the string from the purpose to which it is being assigned has produced very solid results for me.”

Thanks for giving your comments

Thank you very much for giving your comments and sharing your perspectives with our collablogaunity.  This entry in the series highlighted commendable comments on OCDQ Blog posts published between July and November of 2011.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please keep on commenting and stay tuned for future entries in the series.

Thank you for reading the Obsessive-Compulsive Data Quality (OCDQ) blog.  Your readership is deeply appreciated.

 

Related Posts

Commendable Comments (Part 10) – The 300th OCDQ Blog Post

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

OCDQ Blog Bicentennial – The 200th OCDQ Blog Post

Commendable Comments (Part 9)

Commendable Comments (Part 8)

Commendable Comments (Part 7)

Commendable Comments (Part 6)

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

Commendable Comments (Part 4)

Commendable Comments (Part 3)

Commendable Comments (Part 2)

Commendable Comments (Part 1)