Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments

Entries in Best of 2011 (43)

Tuesday
Jan032012

Best OCDQ Blog Posts of 2011

Welcome to my roundup of the best blog posts published on the Obsessive-Compulsive Data Quality (OCDQ) blog during 2011.

My selections were based on a pseudo-scientific, quasi-statistical combination of page views, comments, and re-tweets (as well as choosing a few of my personal favorites).  Instead of ordering the posts chronologically, I decided to organize them by theme.

 

The Metadata Trilogy

Although it has an incredibly important role to play in data quality and its related disciplines, I don’t write about metadata very often.  But the reader feedback that I received lead me to writing three blog posts about metadata in the span of a few weeks:

  • The Metadata Crisis — There is a running debate within many organizations over the meaning of commonly used terms, which complicates what on the surface seem like straightforward business questions.
  • The Metadata Continuum — There is a continuum, where at one end we have the uniformity of controlled vocabularies, and at the other end we have the flexibility of chaotic folksonomies.  However, both flexibility and uniformity provide value.
  • You Say Potato and I Say Tater Tot — The demarcations of the borders between metadata, data, and information are important, but sometimes difficult to discern.  In this post, I offer an explanation about these demarcations using potatoes.

 

The Data Governance Star Wars (one less than a) Trilogy

In June, Rob Karel of Forrester Research and I used a Star Wars themed blog mock debate to take on one of data governance’s biggest challenges — how to balance bureaucracy and business agility.  Gwen Thomas of the Data Governance Institute joined Rob and I to continue the discussion during a special, extended, and Star Wars themed episode of OCDQ Radio:

  • Data Governance Star Wars on OCDQ Radio — In Part 1, Rob Karel and I discuss our blog mock debate, which is followed by a brief Star Wars themed intermission, and then in Part 2, Gwen Thomas joins us to provide her excellent insights.

 

Although not Star Wars themed, here are some additional Best OCDQ Blog Posts of 2011 on the topic of data governance:

  • Data Governance and the Adjacent Possible — It’s important to demonstrate that some data governance policies reflect existing best practices, which helps reduce resistance to change, and therefore I advise: “If it ain’t broke, bricolage it.”
  • Aristotle, Data Governance, and Lead Rulers — Well-constructed data governance policies are like lead rulers — flexible rules that empower us with an understanding of the principle of the policy, and how to enforce it in a particular context.
  • The Stakeholder’s Dilemma — There will be times when sacrifices for the long-term greater good will require that stakeholders either contribute more resources during the current phase, or receive fewer benefits from its deliverables.
  • Beware the Data Governance Ides of March — My dramatized warning about relying too much on the top-down approach to implementing data governance — and especially if your organization has any data stewards named Brutus or Cassius.

 

OCDQ Radio

In June, I launched OCDQ Radio, which is a vendor-neutral podcast about data quality and the audio complement to this blog, providing me with a platform for recorded discussions with the great folks working in the data management industry.  So far, there have been 21 episodes of OCDQ Radio, including 22 guests from 7 countries.  Here are a few of the most popular episodes:

  • The Fall Back Recap Show — A look back at the Best of OCDQ Radio, including discussions about Data, Information, Business-IT Collaboration, Change Management, Big Analytics, Data Governance, and the Data Revolution.
  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.
  • Social Media Strategy — Guest Crysta Anderson of IBM Initiate explains social media strategy and content marketing, including three recommended practices: (1) Listen intently, (2) Communicate succinctly, and (3) Have fun.

 

The Best of the Rest

  • DQ-View: Talking about DataDQ-View video discussion about how data professionals should talk about data when invited to participate in business discussions within their organizations.
  • The Speed of Decision — Examines the constraints that time puts on data-driven decision making, pondering whether decision speed is more important than data quality and decision quality.
  • The Data Cold War — Examines how Google and Facebook have performed the Master Data Management Magic Trick and socialized data (“Information wants to be free!”) in order to capitalize data as a true corporate asset.
  • A Farscape Analogy for Data Quality — Ponders whether data is not viewed as an asset because data has so thoroughly pervaded the enterprise that data has become invisible to those who are so dependent upon its quality.
  • No Datum is an Island of Serendip — Our organizations need to create collaborative environments that foster serendipitous connections bringing all of our business units and people together around our shared data assets.

 

Thank You for Reading OCDQ Blog in 2011

In 2011, the Obsessive-Compulsive Data Quality (OCDQ) blog published 112 posts, which received 130,000 total page views, averaging 350 page views and 150 unique visitors a day.

Thank you for reading OCDQ Blog in 2011.  Your readership was deeply appreciated.

 

Related Posts

So Long 2011, and Thanks for All the . . . – The OCDQ Radio 2011 Year in Review

2011 Quarterly Review of the Data Roundtable (Part 3)

2011 Quarterly Review of the Data Roundtable (Part 2)

2011 Quarterly Review of the Data Roundtable (Part 1)

Commendable Comments (Part 10) – The 300th OCDQ Blog Post

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

OCDQ Blog Bicentennial – The 200th OCDQ Blog Post

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

The Best Data Quality Blog Posts of 2010

Thursday
Dec292011

So Long 2011, and Thanks for All the . . .

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Don’t Panic!  Welcome to the mostly harmless OCDQ Radio 2011 Year in Review episode.  During this approximately 42 minute episode, I recap the data-related highlights of 2011 in a series of sometimes serious, sometimes funny, segments, as well as make wacky and wildly inaccurate data-related predictions about 2012.

Special thanks to my guests Jarrett Goldfedder, who discusses Big Data, Nicola Askham, who discusses Data Governance, and Daragh O Brien, who discusses Data Privacy.  Additional thanks to Rich Murnane and Dylan Jones.  And Deep Thanks to that frood Douglas Adams, who always knew where his towel was, and who wrote The Hitchhiker’s Guide to the Galaxy.

 

So Long 2011, and Thanks for All the . . .

Additional listening options:

 

Previous OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Thursday
Dec152011

Information Overload Revisited

This blog post is sponsored by the Enterprise CIO Forum and HP.

Information Overload is a term invoked regularly during discussions about the data deluge of the Information Age, which has created a 24 hours a day, 7 days a week, 365 days a year, world-wide whirlwind of constant information flow, where the very air we breath is literally teeming with digital data streams — continually inundating us with new, and new types of, information.

Information overload generally refers to how too much information can overwhelm our ability to understand an issue, and can even disable our decision making in regards to that issue (this latter aspect is generally referred to as Analysis Paralysis).

But we often forget that the term is over 40 years old.  It was popularized by Alvin Toffler in his bestselling book Future Shock, which was published in 1970, back when the Internet was still in its infancy, and long before the Internet’s progeny would give birth to the clouds contributing to the present, potentially perpetual, forecast for data precipitation.

A related term that has become big in the data management industry is Big Data, which, as Gartner Research explains, although the term acknowledges the exponential growth, availability, and use of information in today’s data-rich landscape, big data is about more than just data volume.  Data variety (i.e., structured, semi-structured, and unstructured data, as well as other types, such as the sensor data emanating from the Internet of Things) and data velocity (i.e., how fast data is being produced and how fast the data must be processed to meet demand) are also key characteristics of the big challenges of big data.

John Dodge and Bob Gourley recently discussed big data on Enterprise CIO Forum Radio, where Gourley explained that big data is essentially “the data that your enterprise is not currently able to do analysis over.”  This point resonates with a similar one made by Bill Laberis, who recently discussed new global research where half of the companies polled responded that they cannot effectively deal with analyzing the rising tide of data available to them.

Most of the big angst about big data comes from this fear that organizations are not tapping the potential business value of all that data not currently being included in their analytics and decision making.  This reminds me of psychologist Herbert Simon, who won the 1978 Nobel Prize in Economics for his pioneering research on decision making, which included comparing and contrasting the decision-making strategies of maximizing and satisficing (a term that combines satisfying with sufficing).

Simon explained that a maximizer is like a perfectionist who considers all the data they can find because they need to be assured that their decision was the best that could be made.  This creates a psychologically daunting task, especially as the amount of available data constantly increases (again, note that this observation was made over 40 years ago).  The alternative is to be a satisficer, someone who attempts to meet criteria for adequacy rather than identify an optimal solution.  And especially when time is a critical factor, such as it is with the real-time decision making demanded by a constantly changing business world.

Big data strategies will also have to compare and contrast maximizing and satisficing.  Maximizers, if driven by their angst about all that data they are not analyzing, might succumb to information overload.  Satisficers, if driven by information optimization, might sufficiently integrate just enough of big data into their business analytics in a way that satisfies specific business needs.

As big data forces us to revisit information overload, it may be useful for us to remember that originally the primary concern was not about the increasing amount of information, but instead the increasing access to information.  As Clay Shirky succinctly stated, “It’s not information overload, it’s filter failure.”  So, to harness the business value of big data, we will need better filters, which may ultimately make for the entire distinction between information overload and information optimization.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

The Data Encryption Keeper

The Cloud Security Paradox

The Good, the Bad, and the Secure

Securing your Digital Fortress

Shadow IT and the New Prometheus

Are Cloud Providers the Bounty Hunters of IT?

The Diderot Effect of New Technology

The IT Consumerization Conundrum

The IT Prime Directive of Business First Contact

A Sadie Hawkins Dance of Business Transformation

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The Partly Cloudy CIO

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

Thursday
Dec082011

You only get a Return from something you actually Invest in

In my previous post, I took a slightly controversial stance on a popular three-word phrase — Root Cause Analysis.  In this post, it’s another popular three-word phrase — Return on Investment (most commonly abbreviated as the acronym ROI).

What is the ROI of purchasing a data quality tool or launching a data governance program?

Zero.  Zip.  Zilch.  Intet.  Ingenting.  Rien.  Nada.  Nothing.  Nichts.  Niets.  Null.  Niente.  Bupkis.

There is No Such Thing as the ROI of purchasing a data quality tool or launching a data governance program.

Before you hire “The Butcher” to eliminate me for being The Man Who Knew Too Little about ROI, please allow me to explain.

 

Returns only come from Investments

Although the reason that you likely purchased a data quality tool is because you have business-critical data quality problems, simply purchasing a tool is not an investment (unless you believe in Magic Beans) since the tool itself is not a solution.

You use tools to build, test, implement, and maintain solutions.  For example, I spent several hundred dollars on new power tools last year for a home improvement project.  However, I haven’t received any return on my home improvement investment for a simple reason — I still haven’t even taken most of the tools out of their packaging yet.  In other words, I barely even started my home improvement project.  It is precisely because I haven’t invested any time and effort that I haven’t seen any returns.  And it certainly isn’t going to help me (although it would help Home Depot) if I believed buying even more new tools was the answer.

Although the reason that you likely launched a data governance program is because you have complex issues involving the intersection of data, business processes, technology, and people, simply launching a data governance program is not an investment (unless you believe in the Hedgehog’s framework) since it does not conjure the three most important letters.

 

Data is only an Asset if Data is a Currency

In his book UnMarketing, Scott Stratten discusses this within the context of the ROI of social media (a commonly misunderstood aspect of social media strategy), but his insight is just as applicable to any discussion of ROI.  “Think of it this way: You wouldn’t open a business bank account and ask to withdraw $5,000 before depositing anything. The banker would think you are a loony.”

Yet, as Stratten explained, people do this all the time in social media by failing to build up what is known as social currency.  “You’ve got to invest in something before withdrawing. Investing your social currency means giving your time, your knowledge, and your efforts to that channel before trying to withdraw monetary currency.”

The same logic applies perfectly to data quality and data governance, where we could say it’s the failure to build up what I will call data currency.  You’ve got to invest in data before you could ever consider data an asset to your organization.  Investing your data currency means giving your time, your knowledge, and your efforts to data quality and data governance before trying to withdraw monetary currency (i.e., before trying to calculate the ROI of a data quality tool or a data governance program).

If you actually want to get a return on your investment, then actually invest in your data.  Invest in doing the hard daily work of continuously improving your data quality and putting into practice your data governance principles, policies, and procedures.

Data is only an asset if data is a currency.  Invest in your data currency, and you will eventually get a return on your investment.

You only get a return from something you actually invest in.

 

Related Posts

Can Enterprise-Class Solutions Ever Deliver ROI?

Do you believe in Magic (Quadrants)?

Which came first, the Data Quality Tool or the Business Need?

What Data Quality Technology Wants

The Technology Carousel

Council Data Governance

A Farscape Analogy for Data Quality

The Data Quality Wager

“Some is not a number and soon is not a time”

“What is is the was of what shall be”

The HedgeFoxian Hypothesis

The Dumb and Dumber Guide to Data Quality

Monday
Dec052011

There is No Such Thing as a Root Cause

Root cause analysis.  Most people within the industry, myself included, often discuss the importance of determining the root cause of data governance and data quality issues.  However, the complex cause and effect relationships underlying an issue means that when an issue is encountered, often you are only seeing one of the numerous effects of its root cause (or causes).

In my post The Root! The Root! The Root Cause is on Fire!, I poked fun at those resistant to root cause analysis with the lyrics:

The Root! The Root! The Root Cause is on Fire!
We don’t want to determine why, just let the Root Cause burn.
Burn, Root Cause, Burn!

However, I think that the time is long overdue for even me to admit the truth — There is No Such Thing as a Root Cause.

Before you charge at me with torches and pitchforks for having an Abby Normal brain, please allow me to explain.

 

Defect Prevention, Mouse Traps, and Spam Filters

Some advocates of defect prevention claim that zero defects is not only a useful motivation, but also an attainable goal.  In my post The Asymptote of Data Quality, I quoted Daniel Pink’s book Drive: The Surprising Truth About What Motivates Us:

“Mastery is an asymptote.  You can approach it.  You can home in on it.  You can get really, really, really close to it.  But you can never touch it.  Mastery is impossible to realize fully.

The mastery asymptote is a source of frustration.  Why reach for something you can never fully attain?

But it’s also a source of allure.  Why not reach for it?  The joy is in the pursuit more than the realization.

In the end, mastery attracts precisely because mastery eludes.”

The mastery of defect prevention is sometimes distorted into a belief in data perfection, into a belief that we can not just build a better mousetrap, but we can build a mousetrap that could catch all the mice, or that by placing a mousetrap in our garage, which prevents mice from entering via the garage, we somehow also prevent mice from finding another way into our house.

Obviously, we can’t catch all the mice.  However, that doesn’t mean we should let the mice be like Pinky and the Brain:

Pinky: “Gee, Brain, what do you want to do tonight?”

The Brain: “The same thing we do every night, Pinky — Try to take over the world!”

My point is that defect prevention is not the same thing as defect elimination.  Defects evolve.  An excellent example of this is spam.  Even conservative estimates indicate almost 80% of all e-mail sent world-wide is spam.  A similar percentage of blog comments are spam, and spam generating bots are quite prevalent on Twitter and other micro-blogging and social networking services.  The inconvenient truth is that as we build better and better spam filters, spammers create better and better spam.

Just as mousetraps don’t eliminate mice and spam filters don’t eliminate spam, defect prevention doesn’t eliminate defects.

However, mousetraps, spam filters, and defect prevention are essential proactive best practices.

 

There are No Lines of Causation — Only Loops of Correlation

There are no root causes, only strong correlations.  And correlations are strengthened by continuous monitoring.  Believing there are root causes means believing continuous monitoring, and by extension, continuous improvement, has an end point.  I call this the defect elimination fallacy, which I parodied in song in my post Imagining the Future of Data Quality.

Knowing there are only strong correlations means knowing continuous improvement is an infinite feedback loop.  A practical example of this reality comes from data-driven decision making, where:

  1. Better Business Performance is often correlated with
  2. Better Decisions, which, in turn, are often correlated with
  3. Better Data, which is precisely why Better Decisions with Better Data is foundational to Business Success — however . . .

This does not mean that we can draw straight lines of causation between (3) and (1), (3) and (2), or (2) and (1).

Despite our preference for simplicity over complexity, if bad data was the root cause of bad decisions and/or bad business performance, every organization would never be profitable, and if good data was the root cause of good decisions and/or good business performance, every organization could always be profitable.  Even if good data was a root cause, not just a correlation, and even when data perfection is temporarily achieved, the effects would still be ephemeral because not only do defects evolve, but so does the business world.  This evolution requires an endless revolution of continuous monitoring and improvement.

Many organizations implement data quality thresholds to close the feedback loop evaluating the effectiveness of their data management and data governance, but few implement decision quality thresholds to close the feedback loop evaluating the effectiveness of their data-driven decision making.

The quality of a decision is determined by the business results it produces, not the person who made the decision, the quality of the data used to support the decision, or even the decision-making technique.  Of course, the reality is that business results are often not immediate and may sometimes be contingent upon the complex interplay of multiple decisions.

Even though evaluating decision quality only establishes a correlation, and not a causation, between the decision execution and its business results, it is still essential to continuously monitor data-driven decision making.

Although the business world will never be totally predictable, we can not turn a blind eye to the need for data-driven decision making best practices, or the reality that no best practice can eliminate the potential for poor data quality and decision quality, nor the potential for poor business results even despite better data quality and decision quality.  Central to continuous improvement is the importance of closing the feedback loops that make data-driven decisions more transparent through better monitoring, allowing the organization to learn from its decision-making mistakes, and make adjustments when necessary.

We need to connect the dots of better business performance, better decisions, and better data by drawing loops of correlation.

 

Decision-Data Feedback Loop

Continuous improvement enables better decisions with better data, which drives better business performance — as long as you never stop looping the Decision-Data Feedback Loop, and start accepting that there is no such thing as a root cause.

I discuss this, and other aspects of data-driven decision making, in my DataFlux white paper, which is available for download (registration required) using the following link: Decision-Driven Data Management

 

Related Posts

The Root! The Root! The Root Cause is on Fire!

Bayesian Data-Driven Decision Making

The Role of Data Quality Monitoring in Data Governance

The Circle of Quality

Oughtn’t you audit?

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

Imagining the Future of Data Quality

What going to the Dentist taught me about Data Quality

DQ-Tip: “There is No Such Thing as Data Accuracy...”

The HedgeFoxian Hypothesis

Tuesday
Nov292011

No Datum is an Island of Serendip

Continuing a series of blog posts inspired by the highly recommended book Where Good Ideas Come From by Steven Johnson, in this blog post I want to discuss the important role that serendipity plays in data — and, by extension, business success.

Let’s start with a brief etymology lesson.  The origin of the word serendipity, which is commonly defined as a “happy accident” or “pleasant surprise” can be traced to the Persian fairy tale The Three Princes of Serendip, whose heroes were always making discoveries of things they were not in quest of either by accident or by sagacity (i.e., the ability to link together apparently innocuous facts to come to a valuable conclusion).  Serendip was an old name for the island nation now known as Sri Lanka.

“Serendipity,” Johnson explained, “is not just about embracing random encounters for the sheer exhilaration of it.  Serendipity is built out of happy accidents, to be sure, but what makes them happy is the fact that the discovery you’ve made is meaningful to you.  It completes a hunch, or opens up a door in the adjacent possible that you had overlooked.  Serendipitous discoveries often involve exchanges across traditional disciplines.  Serendipity needs unlikely collisions and discoveries, but it also needs something to anchor those discoveries.  The challenge, of course, is how to create environments that foster these serendipitous connections.”

 

No Datum is an Island of Serendip

“No man is an island, entire of itself; every man is a piece of the continent, a part of the main.”

These famous words were written by the poet John Donne, the meaning of which is generally regarded to be that human beings do not thrive when isolated from others.  Likewise, data does not thrive in isolation.  However, many organizations persist on data isolation, on data silos created when separate business units see power in the hoarding of data, not in the sharing of data.

But no business unit is an island, entire of itself; every business unit is a piece of the organization, a part of the enterprise.

Likewise, no datum is an Island of Serendip.  Data thrives through the connections, collisions, and combinations that collectively unleash serendipity.  When data is exchanged across organizational boundaries, and shared with the entire enterprise, it enables the interdisciplinary discoveries required for making business success more than just a happy accident or pleasant surprise.

Our organizations need to create collaborative environments that foster serendipitous connections bringing all of our business units and people together around our shared data assets.  We need to transcend our organizational boundaries, reduce our data silos, and gather our enterprise’s heroes together on the Data Island of Serendip — our United Nation of Business Success.

 

Related Posts

Data Governance and the Adjacent Possible

The Three Most Important Letters in Data Governance

The Stakeholder’s Dilemma

The Data Cold War

Turning Data Silos into Glass Houses

The Good Data

DQ-BE: Single Version of the Time

My Own Private Data

Sharing Data

Are you Building Bridges or Digging Moats?

The Collaborative Culture of Data Governance

The Interconnected User Interface

Thursday
Nov172011

The Speed of Decision

In a previous post, I used the Large Hadron Collider as a metaphor for big data and big analytics where the creative destruction caused by high-velocity collisions of large volumes of varying data attempt to reveal elementary particles of business intelligence.

Since recent scientific experiments have sparked discussion about the possibility of exceeding the speed of light, in this blog post I examine whether it’s possible to exceed the speed of decision (i.e., the constraints that time puts on data-driven decision making).

 

Is Decision Speed more important than Data Quality?

In my blog post Thaler’s Apples and Data Quality Oranges, I explained how time-inconsistent data quality preferences within business intelligence reflect the reality that with the speed at which things change these days, more near-real-time operational business decisions are required, which sometimes makes decision speed more important than data quality.

Even though advancements in computational power, network bandwidth, parallel processing frameworks (e.g., MapReduce), scalable and distributed models (e.g., cloud computing), and other techniques (e.g., in-memory computing) are making real-time data-driven decisions more technologically possible than ever before, as I explained in my blog post Satisficing Data Quality, data-driven decision making often has to contend with the practical trade-offs between correct answers and timely answers.

Although we can’t afford to completely sacrifice data quality for faster business decisions, and obviously high quality data is preferable to poor quality data, less than perfect data quality can not be used as an excuse to delay making a critical decision.

 

Is Decision Speed more important than Decision Quality?

The increasing demand for real-time data-driven decisions is not only requiring us to re-evaluate our data quality thresholds.  In my blog post The Circle of Quality, I explained the connection between data quality and decision quality, and how result quality trumps them both because an organization’s success is measured by the quality of the business results it produces.

Again, with the speed at which the business world now changes, the reality is that the fear of making a mistake can not be used as an excuse to delay making a critical decision, which sometimes makes decision speed more important than decision quality.

“Fail faster” has long been hailed as the mantra of business innovation.  It’s not because failure is a laudable business goal, but instead because the faster you can identify your mistakes, the faster you can correct your mistakes.  Of course this requires that you are actually willing to admit you made a mistake.

(As an aside, I often wonder what’s more difficult for an organization to admit: poor data quality or poor decision quality?)

Although good decisions are obviously preferable to bad decisions, we have to acknowledge the fragility of our knowledge and accept that mistake-driven learning is an essential element of efficient and effective data-driven decision making.

Although the speed of decision is not the same type of constant as the speed of light, in our constantly changing business world, the speed of decision represents the constant demand for good-enough data for fast-enough decisions.

 

Related Posts

The Big Data Collider

A Decision Needle in a Data Haystack

The Data-Decision Symphony

Thaler’s Apples and Data Quality Oranges

Satisficing Data Quality

Data Confabulation in Business Intelligence

The Data that Supported the Decision

Data Psychedelicatessen

OCDQ Radio - Big Data and Big Analytics

OCDQ Radio - Good-Enough Data for Fast-Enough Decisions

Data, Information, and Knowledge Management

Data In, Decision Out

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

The Circle of Quality

Friday
Nov112011

You Say Potato and I Say Tater Tot

One thread of the comment discussion on my blog post The Metadata Continuum raised the excellent point that the demarcation of the border between data and metadata is important, but sometimes difficult to discern.  By extension, we can say the same thing about the demarcation of the border between data and information.

So, in this blog post, I thought I would try to offer an explanation about the importance of these demarcations using potatoes.

 

You Say Potato and I Say Potahto

Let’s Call the Whole Thing Off was a song written by George Gershwin and Ira Gershwin, which became famous for its playful lyrics that poked fun at the differences in the pronunciation of words, such as “you say potato and I say potahto.”

Spelling and pronunciation are included in the dictionary definition of a word, which is a good example of one of the many uses of metadata, namely as a label that provides a definition, description, and context for data.  Essentially, metadata describes data, and since data is attempting to describe a real world object, such as a potato, metadata is a further abstraction from reality.

And as we saw with the example of white horses in my blog post The Metadata Crisis, these abstract definitions can also include additional classifications (e.g., there are over 4,000 different varieties of potato), which also have to be well defined in order to facilitate clear communication and effective discussion.  These levels of abstractions, definitions, and classifications are essential to our attempts to understand, and do business with, the real world.  And this challenge continues even further with information.

 

You Say Potato and I Say Tater Tot

The difference, and relationship, between data and information is a common debate.  Not only do these two terms have varying definitions, but they are often used interchangeably.  Just a few examples include comparing and contrasting data quality with information quality, data management with information management, and data governance with information governance.

Some consider this an esoteric debate between data geeks and information nerds, but what is not debated is the importance of understanding how organizations use data and/or information to support their business activities.

Extending my analogy, data is like a potato and information is like a tater tot.  In other words, information is one of the many possible specific uses for data.  Information is one of the many possible specific things that we can make using data, which is why information quality professionals often speak about the information product.

So it’s important to remember that we can’t have a tater tot (information) without a potato (data), and that we can’t have either a tater tot or a potato without having a working definition (metadata) of what a potato is.

 

Let’s Not Call the Whole Thing Data

David Corrigan recently blogged about the importance of the metadata that tracks the lineage of information presented to an end user, and how the root causes of data quality and data governance issues are impossible to discover without this metadata.

Therefore, the lines of demarcation separating metadata, data, and information are not just an esoteric technical debate.  These demarcations are foundational to the efficiency and effectiveness of business operations.  So, let’s not call the whole thing data.

Let’s acknowledge the separate, but deeply interrelated, continuum formed by the disciplines of metadata, data, and information.

 

Related Posts

The Metadata Continuum

The Metadata Crisis

What’s the Meta with your Data?

Let’s Meta a Data

Listen to Peter Benson discuss Metadata, Data, and Information on the Knights of the Data Roundtable

Listen to Daragh O Brien discuss Data and Information Quality on OCDQ Radio

Listen to Gordon Hamilton discuss the Information Product on OCDQ Radio

Plato’s Data

Data, Information, and Knowledge Management

The Data-Information Continuum

The First Law of Data Quality

OCDQ Radio - The Fall Back Recap Show

Tuesday
Nov082011

The Three Most Important Letters in Data Governance

In his book I Is an Other: The Secret Life of Metaphor and How It Shapes the Way We See the World, James Geary included several examples of the psychological concept of priming.  “Our metaphors prime how we think and act.  This kind of associative priming goes on all the time.  In one study, researchers showed participants pictures of objects characteristic of a business setting: briefcases, boardroom tables, a fountain pen, men’s and women’s suits.  Another group saw pictures of objects—a kite, sheet music, a toothbrush, a telephone—not characteristic of any particular setting.”

“Both groups then had to interpret an ambiguous social situation, which could be described in several different ways.  Those primed by pictures of business-related objects consistently interpreted the situation as more competitive than those who looked at pictures of kites and toothbrushes.”

“This group’s competitive frame of mind asserted itself in a word completion task as well.  Asked to complete fragments such as wa_, _ight, and co_p__tive, the business primes produced words like war, fight, and competitive more often than the control group, eschewing equally plausible alternatives like was, light, and cooperative.”

Communication, collaboration, and change management are arguably the three most critical aspects for implementing a new data governance program successfully.  Since all three aspects are people-centric, we should pay careful attention to how we are priming people to think and act within the context of data governance principles, policies, and procedures.  We could simplify this down to whether we are fostering an environment that primes people for cooperation—or primes people for competition.

Since there are only three letters of difference between the words cooperative and competitive, we could say that these are the three most important letters in data governance.

 

Related Posts

Data Governance and the Adjacent Possible

Turning Data Silos into Glass Houses

Aristotle, Data Governance, and Lead Rulers

OCDQ Radio - The Blue Box of Information Quality

The Stakeholder’s Dilemma

The Prince of Data Governance

Beware the Data Governance Ides of March

Jack Bauer and Enforcing Data Governance Policies

Council Data Governance

The Data Governance Oratorio

OCDQ Radio - Data Governance Star Wars

Data Governance Star Wars: Balancing Bureaucracy And Agility

A Tale of Two G’s

The People Platform

The Collaborative Culture of Data Governance

Thursday
Nov032011

The Metadata Continuum

Since my previous post about metadata received excellent commentary, I decided to write a follow-up post to address one of the many great points this discussion and its participants raised, namely the role of controlled vocabularies or metadata dictionaries.

According to an insightful comment from John O’Gorman, “the nature of the medium in which we are trying to solve these problems is multi-dimensional.  Any organization can have—and should manage—multiple dialects.”

“By that I mean,” O’Gorman continued, “in the dialect of accounting, customer means some agent who has contributed to increased sales.  In the dialect of marketing, customer can mean anyone with a pulse that will sit and listen to a pitch.  This insistence on a single version of anything, which is embedded in controlled vocabularies, relational tables, object classes, or a folder structure, is the single largest impediment to cleaning up the digital wasteland.”

One example of this digital wasteland metadata challenge, taken from the crowd-sourced wisdom of social media, is a hashtag, which Twitter users include in their tweets in order to tag them for search engines and trending topics websites.

Since it’s also a common strategy for making any type of unstructured data more usable, tagging is a great example of one of the semantic challenges of metadata.  Users freely choosing tags often creates a so-called folksonomy, as opposed to users being forced to only select terms from a controlled vocabulary.  Which is precisely why the metadata resulting from tagging can include homonyms (i.e., the same tags used with different meanings) and synonyms (i.e., multiple tags for the same concept), which may lead to inappropriate data relationships and inefficient searches for data about a particular subject.

 

The Metadata of Babel

Another insightful comment came from Peter Benson, based on his work with the eOTD (ECCMA Open Technical Dictionary).

“Mention the word metadata,” Benson explained, “and you have immediately lost all but the hard core techies and they have neither the authority nor the budget to solve the problem.  If you take a hard look at the financial crisis or cancer research you will indeed find the reason the challenges are so difficult to solve is in large part because of the limitations in our ability to communicate effectively and the lack of transparency that comes from poor data integration.  So, metadata is really important.”

“The Babel approach of a single language to unite them all,” Benson continued, “has a very poor track history and there is good reason for this.  Language is more about power and authority than it is about true communication.  We have tried to come up with a solution that is solely focused on achieving unambiguous communication.  It really does not matter what it is called as long as we agree on what it is.  We do this by using terminology to define concepts and then assigning concept identifiers that are used as metadata.  The separation of the terminology from the concept identifier, or rather linking terminology through a concept identifier, allows everyone to remain comfortably in their own space yet communicate with others.”

 

The Metadata Continuum

So it would appear that we face a daunting challenge, which we could call the Metadata Continuum, where at one end we have the uniformity of controlled vocabularies, and at the other end we have the flexibility of chaotic folksonomies.  The daily business operations of most organizations are governed by a metadata strategy that falls somewhere in between, which begs the question: In which direction should the best practices of metadata management flow—toward flexibility or toward uniformity?

Since in my previous post I used an example of the metadata complexities of everyday language, I thought it might be useful to share two perspectives about linguistic flexibility and uniformity.

In his book Final Jeopardy: Man vs. Machine and the Quest to Know Everything, Stephen Baker explained that “flexibility isn’t a weakness of language, but a strength.  Humans need words to be inexact.  If they were too precise, each person would have a unique vocabulary of several billion words, all of them unintelligible to everyone else.  You might have a unique word for the sip of coffee you just took at 7:59 A.M., which was flavored with the anxiety about the traffic in the Lincoln Tunnel or along Paris’s Boulevard Périphérique.  But that single word would be as useless to you as to everyone else.  A word has to be used at least twice to have any purpose.  Each word is a lingua franca, a fragment of a clumsy common language.”

“Yet paradoxically,” explained Kevin Kelly, in his book What Technology Wants, “diversity can be unleashed by a type of uniformity.  The uniformity of a standard writing system (like an alphabet or script) unleashes the unexpected diversity of literature.  Without uniform rules, every word has to be made up, so communication is localized, inefficient, and thwarted.”

“But with a uniform language,” Kelly continued, “sufficient communication transpires in large circles so that a novel word, phrase, or idea can be appreciated, caught, and disseminated.  The rigidity of an alphabet has done more to enable creativity than any unhinged brain-storming exercise ever invented.  The standard 26 letters in English have produced 16 million different books in English.  Words and language will keep evolving, but their evolution rides on basic fundamentals that are conserved and shared; unvarying (over the short term) letters, spelling, and grammar rules enable creativity in ideas.  In a curious way, the homogenization of shared universals allows the transmission of diversity.”

Perhaps since both flexibility and uniformity have linguistic value, metadata will forever remain a continuum between the two.

Where along the Metadata Continuum is your organization?

 

Related Posts

The Metadata Crisis

What’s the Meta with your Data?

Let’s Meta a Data

The First Law of Data Quality

Data Quality and the Cupertino Effect

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

Plato’s Data

Data, Information, and Knowledge Management

The Data Cold War

The Semantic Future of MDM

OCDQ Radio - A Brave New Data World

Thursday
Oct272011

The Metadata Crisis

I am reading the book The Information: A History, a Theory, a Flood by James Gleick, which recounts a dialogue written by the ancient Chinese philosopher Gongsun Long known as When a White Horse is Not a Horse:

“Horses certainly have color.  Hence, there are white horses.  If it were the case that horses had no color, there would simply be horses, and then how could one select a white horse?  And so it follows that a horse and a white horse are different.  Hence, I say that a white horse is not a horse.

Furthermore, a white horse is a horse and white, but horse is that by means of which one names the shape, and white is that by means of which one names the color.  What names the color is not what names the shape.  Hence, I say that a white horse is not a horse.”

“On its face, this is unfathomable,” explained Gleick, “but it begins to come into focus as a statement about language and logic.  Paradoxes like this formed part of what Chinese historians called the language crisis, a running debate over the nature of language.  Names are not the things they name.”

One of my favorite topics is how data is not the real world it describes.  But perhaps a better data management example of how “names are not the things they name” is metadata, which Julie Hunt blogged about in her post Stumbling Over Metadata, which explored better definitions than the oversimplified “metadata is data about data.”

Metadata can be thought of as a label that provides a definition, description, and context for data.  Common examples include relational table definitions and flat file layouts.  More detailed examples of metadata include conceptual and logical data models.

Therefore, metadata—among its many other uses—often plays an integral role in determining your data usage.  Although it’s often overlooked, there is a strong relationship between metadata and data quality, and by extension, between metadata and data-driven decision making, since a business intelligence report’s metadata often provides the framing effect for its data.

I have often witnessed what could be called the metadata crisis, a running debate within many organizations over the meaning of commonly used terms like revenue, which complicates what on the surface seem like straightforward business questions, such as how much revenue was generated during a particular fiscal reporting period.

A metadata management version of When a White Horse is Not a Horse might be When Recognized Revenue is Not Revenue.

However, the complexities of revenue recognition probably pale in comparison with the metadata crisis that can be caused by what David Loshin calls the most dangerous question in data management: What is the definition of customer?

What examples of the metadata crisis have you encountered in your organization?

 

Related Posts

What’s the Meta with your Data?

Let’s Meta a Data

The First Law of Data Quality

Data Quality and the Cupertino Effect

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

Plato’s Data

Data, Information, and Knowledge Management

The Data Cold War

The Semantic Future of MDM

OCDQ Radio - Master Data Management in Practice

OCDQ Radio - A Brave New Data World

Thursday
Oct202011

Data Governance and the Adjacent Possible

I am reading the book Where Good Ideas Come From by Steven Johnson, which examines recurring patterns in the history of innovation.  The first pattern Johnson writes about is called the Adjacent Possible, which is a term coined by Stuart Kauffman, and is described as “a kind of shadow future, hovering on the edges of the present state of things, a map of all the ways in which the present can reinvent itself.  Yet it is not an infinite space, or a totally open playing field.  The strange and beautiful truth about the adjacent possible is that its boundaries grow as you explore those boundaries.”

Exploring the adjacent possible is like exploring “a house that magically expands with each door you open.  You begin in a room with four doors, each leading to a new room that you haven’t visited yet.  Those four rooms are the adjacent possible.  But once you open any one of those doors and stroll into that room, three new doors appear, each leading to a brand-new room that you couldn’t have reached from your original starting point.  Keep opening new doors and eventually you’ll have built a palace.”

 

If it ain’t broke, bricolage it

“If it ain’t broke, don’t fix it” is a common defense of the status quo, which often encourages an environment that stifles innovation and the acceptance of new ideas.  The status quo is like staying in the same familiar and comfortable room and choosing to keep all four of its doors closed.

The change management efforts of data governance often don’t talk about opening one of those existing doors.  Instead they often broadcast the counter-productive message that “everything is so broken, we can’t fix it.”  We need to destroy our existing house and rebuild it from scratch with brand new rooms — and probably with one of those open floor plans without any doors.

Should it really be surprising when this approach to change management is so strongly resisted?

The term bricolage can be defined as making creative and resourceful use of whatever materials are at hand regardless of their original purpose, stringing old parts together to form something radically new, transforming the present into the near future.

“Good ideas are not conjured out of thin air,” explains Johnson, “they are built out of a collection of existing parts.”

The primary reason that the change management efforts of data governance are resisted is because they rely almost exclusively on negative methods—they emphasize broken business and technical processes, as well as bad data-related employee behaviors.

Although these problems exist and are the root cause of some of the organization’s failures, there are also unheralded processes and employees that prevented other problems from happening, which are the root cause of some of the organization’s successes.

It’s important to demonstrate that some data governance policies reflect existing best practices, which helps reduce resistance to change, and so a far more productive change management mantra for data governance is: “If it ain’t broke, bricolage it.”

 

Data Governance and the Adjacent Possible

As Johnson explains, “in our work lives, in our creative pursuits, in the organizations that employ us, in the communities we inhabit—in all these different environments, we are surrounded by potential new ways of breaking out of our standard routines.”

“The trick is to figure out ways to explore the edges of possibility that surround you.”

Most data governance maturity models describe an organization’s evolution through a series of stages intended to measure its capability and maturity, tendency toward being reactive or proactive, and inclination to be project-oriented or program-oriented.

Johnson suggests that “one way to think about the path of evolution is as a continual exploration of the adjacent possible.”

Perhaps we need to think about the path of data governance evolution as a continual exploration of the adjacent possible, as a never-ending journey which begins by opening that first door, building a palatial data governance program one room at a time.

 

Related Posts

“What is is the was of what shall be”

Datenvergnügen

Don’t Do Less Bad; Do Better Good

Delivering Data Happiness

Why isn’t our data quality worse?

Data Governance and the Buttered Cat Paradox

Beware the Data Governance Ides of March

Aristotle, Data Governance, and Lead Rulers

Data Governance Star Wars: Balancing Bureaucracy And Agility

OCDQ Radio - Data Governance Star Wars

Monday
Oct172011

The Cloud Security Paradox

This blog post is sponsored by the Enterprise CIO Forum and HP.

Nowadays it seems like any discussion about enterprise security inevitably becomes a discussion about cloud security.  Last week, as I was listening to John Dodge and Bob Gourley discuss recent top cloud security tweets on Enterprise CIO Forum Radio, the story that caught my attention was the Network World article by Christine Burns, part of a six-part series on cloud computing, which had a provocative title declaring that public cloud security remains Mission Impossible.

“Cloud security vendors and cloud services providers have a long way to go,” Burns wrote, “before enterprise customers will be able to find a comfort zone in the public cloud, or even in a public/private hybrid deployment.”  Although I agree with Burns, and I highly recommend reading her entire excellent article, I have always been puzzled by debates over cloud security.

A common opinion is that cloud-based solutions are fundamentally less secure than on-premises solutions.  Some critics even suggest cloud-based solutions can never be secure.  I don’t agree with either opinion because to me it’s all a matter of perspective.

Let’s imagine that I am a cloud-based service provider selling solutions leveraging my own on-premises resources, meaning that I own and operate all of the technology infrastructure within the walls of my one corporate office.  Let’s also imagine that in addition to the public cloud solution that I sell to my customers, I have built a private cloud solution for some of my employees (e.g., salespeople in the field), and that I also have other on-premises systems (e.g., accounting) not connected to any cloud.

Since all of my solutions are leveraging the exact same technology infrastructure, if it is impossible to secure my public cloud, then it logically follows that not only is it impossible to secure my private cloud, but it is also impossible to secure my on-premises systems as well.  Therefore, all of my security must be Mission Impossible.  I refer to this as the Cloud Security Paradox.

Some of you will argue that my scenario was oversimplified, since most cloud-based solutions, whether public or private, may include technology infrastructure that is not under my control, and may be accessed using devices that are not under my control.

Although those are valid security concerns, they are not limited to—nor were they created by—cloud computing, because with the prevalence of smart phones and other mobile devices, those security concerns exist for entirely on-premises solutions as well.

In my opinion, cloud-based versus on-premises, public cloud versus private cloud, and customer access versus employee access, are all oversimplified arguments.  Regardless of the implementation strategy, technology infrastructure and especially your data needs to be secured wherever it is, however it is accessed, and with the appropriate levels of control over who can access what.

Fundamentally, the real problem is a lack of well-defined, well-implemented, and well-enforced security practices.  As Burns rightfully points out, a significant challenge with cloud-based solutions is that “public cloud providers are notoriously unwilling to provide good levels of visibility into their underlying security practices.”

However, when the cost savings and convenience of cloud-based solutions are accepted without a detailed security assessment, that is not a fundamental flaw of cloud computing—that is simply a bad business decision.

Let’s stop blaming poor enterprise security practices on the adoption of cloud computing.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

The Good, the Bad, and the Secure

Securing your Digital Fortress

Shadow IT and the New Prometheus

Are Cloud Providers the Bounty Hunters of IT?

The Diderot Effect of New Technology

The IT Consumerization Conundrum

The IT Prime Directive of Business First Contact

A Sadie Hawkins Dance of Business Transformation

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The Partly Cloudy CIO

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

Thursday
Oct132011

Turning Data Silos into Glass Houses

Although data silos are denounced as inherently bad since they complicate the coordination of enterprise-wide business activities, since they are often used to support some of those business activities, whether or not data silos are good or bad is a matter of perspective.  For example, data silos are bad when different business units are redundantly storing and maintaining their own private copies of the same data, but data silos are good when they are used to protect sensitive data that should not be shared.

Providing the organization with a single system of record, a single version of the truth, a single view, a golden copy, or a consolidated repository of trusted data has long been the anti-data-silo siren song of enterprise data warehousing (EDW), and more recently, of master data management (MDM).  Although these initiatives can provide significant business value, somewhat ironically, many data silos start with EDW or MDM data that was replicated and customized in order to satisfy the particular needs of an operational project or tactical initiative.  This customized data either becomes obsolesced after the conclusion of its project or initiative — or it continues to be used because it is satisfying a business need that EDW and MDM are not.

One of the early goals of a new data governance program should be to provide the organization with a substantially improved view of how it is using its data — including data silos — to support its operational, tactical, and strategic business activities.

Data governance can help the organization catalog existing data sources, build a matrix of data usage and related business processes and technology, identify potential external reference sources to use for data enrichment, as well as help define the metrics that meaningfully measure data quality using business-relevant terminology.

The transparency provided by this combined analysis of the existing data, business, and technology landscape will provide a more comprehensive overview of enterprise data management problems, which will help the organization better evaluate any existing data and technology re-use and redundancies, as well as whether investing in new technology will be necessary.

Data governance can help topple data silos by first turning them into glass houses through transparency, empowering the organization to start throwing stones at those glass houses that must be eliminated.  And when data silos are allowed to persist, they should remain glass houses, clearly illustrating whether or not they have the business-justified reasons for continued use.

 

Related Posts

Data and Process Transparency

The Good Data

The Data Outhouse

Time Silos

Sharing Data

Single Version of the Truth

Beyond a “Single Version of the Truth”

The Quest for the Golden Copy

The Idea of Order in Data

Hell is other people’s data

Thursday
Sep292011

Aristotle, Data Governance, and Lead Rulers

Data governance requires the coordination of a complex combination of a myriad of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data quality remediation, data stewardship, business process optimization, technology enablement, and, perhaps most notably, policy enforcement.

But sometimes this emphasis on enforcing policies makes data governance sound like it’s all about rules.

In their book Practical Wisdom, Barry Schwartz and Kenneth Sharpe use the Nicomachean Ethics of Aristotle as a guide to explain that although rules are important, what is more important is “knowing the proper thing to aim at in any practice, wanting to aim at it, having the skill to figure out how to achieve it in a particular context, and then doing it.”

Aristotle observed the practical wisdom of the craftsmen of his day, including carpenters, shoemakers, blacksmiths, and masons, noting how “their work was not governed by systematically applying rules or following rigid procedures.  The materials they worked with were too irregular, and each task posed new problems.”

“Aristotle was particularly fascinated with how masons used rulers.  A normal straight-edge ruler was of little use to the masons who were carving round columns from slabs of stone and needed to measure the circumference of the columns.”

Unless you bend the ruler.

“Which is exactly what the masons did.  They fashioned a flexible ruler out of lead, a forerunner of today’s tape measure.  For Aristotle, knowing how to bend the rule to fit the circumstance was exactly what practical wisdom was all about.”

Although there’s a tendency to ignore the existing practical wisdom of the organization, successful data governance is not about systematically applying rules or following rigid procedures, and precisely because the dynamic challenges faced, and overcome daily, by business analysts, data stewards, technical architects, and others, exemplify today’s constantly changing business world.

But this doesn’t mean that effective data governance policies can’t be implemented.  It simply means that instead of focusing on who should lead the way (i.e., top-down or bottom-up), we should focus on what the rules of data governance are made of.

Well-constructed data governance policies are like lead rulers—flexible rules that empower us with an understanding of the principle of the policy, and trust us to figure out how best to enforce the policy in a particular context, how to bend the rule to fit the circumstance.  Aristotle knew this was exactly what practical wisdom was all about—data governance needs practical wisdom.

“Tighter rules and regulations, however necessary, are pale substitutes for wisdom,” concluded Schwartz and Sharpe.  “We need rules to protect us from disaster.  But at the same time, rules without wisdom are blind and at best guarantee mediocrity.”

 

Related Posts

Jack Bauer and Enforcing Data Governance Policies

Don’t Do Less Bad; Do Better Good

Data Governance and the Buttered Cat Paradox

Zig-Zag-Diagonal Data Governance

Stuck in the Middle with Data Governance

Beware the Data Governance Ides of March

Data Governance Star Wars: Balancing Bureaucracy And Agility

OCDQ Radio - Data Governance Star Wars

A Tale of Two G’s

Video: Declaration of Data Governance

The Stakeholder’s Dilemma

The Prince of Data Governance

The Data Governance Oratorio

Plato’s Data