Big Data el Memorioso

This blog post is sponsored by the Enterprise CIO Forum and HP.

Funes el memorioso is a short story by Jorge Luis Borges, which describes a young man named Ireneo Funes who, as a result of a horseback riding accident, has lost his ability to forget.  Although Funes has a tremendous memory, he is so lost in the details of everything he knows that he is unable to convert the information into knowledge and unable, as a result, to achieve wisdom.

In Spanish, the word memorioso means “having a vast memory.”  Without question, Big Data has a vast memory comprised of fast-moving large volumes of varying data seemingly providing details about everything your organization could ever want to know about our increasingly digitized and pixelated world.  But what if Big Data is the Ireneo Funes of the Information Age?

What if Big Data el Memorioso is the not-so-short story in which your organization becomes so lost in the details of everything big data delivers that you’re unable to connect enough of the dots to convert the information into knowledge and unable, as a result, to achieve the wisdom necessary to satisfice specific business needs?

Adrian Bridgwater recently compared this challenge to “trying to balance a stack of papers on a moving walkway, in a breeze, without knowing the full length or speed of the walkway itself.  If you want to extend the metaphor one step further — there are other passengers on our walkway and they could bump into us and/or add papers to our stack.  Oh, did I mention that the pieces of paper might not even all be the same size, shape, or color — and some may have tattered edges and coffee stains?”

In other words, as Bridgwater went on to explain, “our information optimization goals will typically include the need to manage information and assess its quantitative and qualitative values.  We will also need to analyze streams of both structured and unstructured data, the latter including video, emails, and other less ‘straight edged’ data.”

While examining some of the technology options that can assist with this challenge, Paul Muller recently remarked “whether it be structured, unstructured, big, small, real-time, or historical — data of all kinds are top-of-mind for business executives.  It may already feel like you’re drowning in data, but it’s important to get to grips with the changing technology landscape to ensure you’re not drowning in an incoherent mess of information management architectures too.”

Edd Dumbill recently wrote an introduction to the big data landscape, which concluded that “big data is no panacea.  You can find patterns and clues in your data, but then what?”  As Dumbill recommends, you need to know where you want to go.  You need to know what problem you want to solve, i.e., you need to pick a real business problem to guide your implementation.

Without this implementation guide, big data will have, as Borges said of Funes, “a certain stammering greatness,” but amount to, as William Shakespeare said in The Tragedy of Macbeth, “a tale told by an idiot, full of sound and fury, signifying nothing.”

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

Neither the I Nor the T is Magic

Information Overload Revisited

The Speed of Decision

The Data-Decision Symphony

A Decision Needle in a Data Haystack

The Big Data Collider

Dot Collectors and Dot Connectors

DQ-View: Data Is as Data Does

Data, Information, and Knowledge Management

Is your data complete and accurate, but useless to your business?

The Real Data Value is Business Insight

Data, data everywhere, but where is data quality?

Dot Collectors and Dot Connectors

The attention blindness inherent in the digital age often leads to a debate about multitasking, which many claim impairs our ability to solve complex problems.  Therefore, we often hear that we need to adopt monotasking, i.e., we need to eliminate all possible distractions and focus our attention on only one task at a time.

However, during the recent Harvard Business Review podcast The Myth of Monotasking, Cathy Davidson, author of the new book Now You See It: How the Brain Science of Attention Will Transform the Way We Live, Work, and Learn, explained how “the moment that you start not paying attention fully to the task at hand, you actually start seeing other things that your attention would have missed.”  Although Davidson acknowledges that attention blindness is a serious problem, she explained that there really is no such thing as monotasking.  Modern neuroscience research has revealed that the human brain is, in fact, always multitasking.  Furthermore, she explained how multitasking can be extremely useful for a new and expansive form of attention.

“We all see selectively, but we don’t select the same things to see,” Davidson explained.  “So if we can learn to work together, we can actually account for, and productively work around, our own individual attention blindness by seeing collaboratively in a way that compensates for that blindness.”

During the podcast, an analogy was made that focusing attention on specific tasks can result in a lot of time spent collecting dots without spending enough time connecting those dots.  This point caused me to ponder the division of organizational labor that has historically existed between the dot collection of data management, which focuses on aspects such as data integrity and data quality, and the dot connection of business intelligence, which focuses on aspects such as data analysis and data visualization.

I think most data management professionals are dot collectors since it often seems like they spend a lot of their time, money, and attention on collecting (and profiling, modeling, cleansing, transforming, matching, and otherwise managing) data dots.

But since data’s value comes from data’s usefulness, merely collecting data dots doesn’t mean anything if you cannot connect those dots into meaningful patterns that enable your organization to take action or otherwise support your business activities.

So I think most business intelligence professionals are dot connectors since it often seems like they spend a lot of their time, money, and attention on connecting (and querying, aggregating, reporting, visualizing, and otherwise analyzing) data dots.

However, the attention blindness of data management and business intelligence professionals means that they see selectively, often intentionally selecting to not see the same things.  But as more of our personal and professional lives become digitized and pixelated, the big picture of the business world is inundated with the multifaceted challenges of big data, where the fast-moving large volumes of varying data are transforming the way we have to view traditional data management and business intelligence.

We need to replace our perspective of data management and business intelligence as separate monotasking activities with an expansive form of organizational multitasking where the dot collectors and dot connectors work together more collaboratively.

 

Related Posts

Channeling My Inner Beagle: The Case for Hyperactivity

Mind the Gap

The Wisdom of the Social Media Crowd

No Datum is an Island of Serendip

DQ-View: Data Is as Data Does

The Real Data Value is Business Insight

Information Overload Revisited

Neither the I Nor the T is Magic

The Big Data Collider

OCDQ Radio - Big Data and Big Analytics

OCDQ Radio - So Long 2011, and Thanks for All the . . .

The Interconnected User Interface

Scary Calendar Effects

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, recorded on the first of three occurrences of Friday the 13th in 2012, I discuss scary calendar effects.

In other words, I discuss how schedules, deadlines, and other date-related aspects can negatively affect enterprise initiatives such as data quality, master data management, and data governance.

Please Beware: This episode concludes with the OCDQ Radio Theater production of Data Quality and Friday the 13th.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

DQ-View: Data Is as Data Does

Data Quality (DQ) View is an OCDQ regular segment.  Each DQ-View is a brief video discussion of a data quality key concept.

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: DQ-View on Vimeo

 

The following list contains the books shown in the video, simply listed in the order they appeared on my book shelf:

 

Previous DQ-View Videos

You can also watch a regularly updated page of my videos by clicking on this link: OCDQ Videos

DQ-View: Baseball and Data Quality

DQ-View: Occam’s Razor Burn

DQ-View: Roman Ruts on the Road to Data Governance

DQ-View: Talking about Data

DQ-View: The Poor Data Quality Blizzard

DQ-View: New Data Resolutions

DQ-View: From Data to Decision

DQ View: Achieving Data Quality Happiness

Data Quality is not a Magic Trick

DQ-View: The Cassandra Effect

DQ-View: Is Data Quality the Sun?

DQ-View: Designated Asker of Stupid Questions

Video: Oh, the Data You’ll Show!

So Long 2011, and Thanks for All the . . .

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Don’t Panic!  Welcome to the mostly harmless OCDQ Radio 2011 Year in Review episode.  During this approximately 42 minute episode, I recap the data-related highlights of 2011 in a series of sometimes serious, sometimes funny, segments, as well as make wacky and wildly inaccurate data-related predictions about 2012.

Special thanks to my guests Jarrett Goldfedder, who discusses Big Data, Nicola Askham, who discusses Data Governance, and Daragh O Brien, who discusses Data Privacy.  Additional thanks to Rich Murnane and Dylan Jones.  And Deep Thanks to that frood Douglas Adams, who always knew where his towel was, and who wrote The Hitchhiker’s Guide to the Galaxy.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Neither the I Nor the T is Magic

This blog post is sponsored by the Enterprise CIO Forum and HP.

It’s that time when we reflect on the past year and try to predict the future, such as Paul Muller, Joel Rothman, and Pearl Zhu did with their recent blog posts.  Although I have previously written about why most predictions don’t come true, in this post, I throw my fortune-telling hat into the 2012 prediction ring.

The information technology (IT) trends of 2011 included consumerization and decentralization, application modernization and information optimization, cloud computing and cloud security (and, by extension, enterprise security).  However, perhaps the biggest IT trend of the year was that 2011 is going out with a Big Bang about Big Data in 2012 and beyond.

Since its inception, the IT industry has both benefited from and battled against the principle known as Clarke’s Third Law:

“Any sufficiently advanced technology is indistinguishable from magic.”

This principle often fuels the Diderot Effect of New Technology, enchanting our organizations with the mad desire to stock up on new technologically magic things.  As such, many are predicting 2012 will be the Year of the Magic Elephant named Hadoop because, as Gartner Research predicts about big data, “the size, complexity of formats, and speed of delivery exceeds the capabilities of traditional data management technologies; it requires the use of new or exotic technologies simply to manage the volume alone.  Many new technologies are emerging, with the potential to be disruptive.  Analytics has become a major driving application.”  As a corollary, the potential business value of integrating big data into business analytics seems to be conjuring up an alternative version of Clarke’s Third Law:

“Any sufficiently advanced information is indistinguishable from magic.”

In other words, many big data proponents (especially IT vendors selling Hadoop-based solutions) extol its virtues as if its information is capable of providing clairvoyant business insight, as if big data was the Data Psychic of the Information Age.

Although both sufficiently advanced information and technology will have important business-enabling IT roles to play in 2012, never forget that neither the I nor the T is magic — no matter what the Data Psychics and Magic Elephants may say.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

Information Overload Revisited

The Data Encryption Keeper

The Cloud Security Paradox

The Good, the Bad, and the Secure

Securing your Digital Fortress

Shadow IT and the New Prometheus

Are Cloud Providers the Bounty Hunters of IT?

The Diderot Effect of New Technology

The IT Consumerization Conundrum

The IT Prime Directive of Business First Contact

A Sadie Hawkins Dance of Business Transformation

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The Partly Cloudy CIO

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

Redefining Data Quality

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, I have an occasionally spirited discussion about data quality with Peter Perera, partially precipitated by his provocative post from this past summer, The End of Data Quality...as we know it, which included his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.

Peter Perera is a recognized consultant and thought leader with significant experience in Master Data Management, Customer Relationship Management, Data Quality, and Customer Data Integration.  For over 20 years, he has been advising and working with Global 5000 organizations and mid-size enterprises to increase the usability and value of their customer information.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Information Overload Revisited

This blog post is sponsored by the Enterprise CIO Forum and HP.

Information Overload is a term invoked regularly during discussions about the data deluge of the Information Age, which has created a 24 hours a day, 7 days a week, 365 days a year, world-wide whirlwind of constant information flow, where the very air we breath is literally teeming with digital data streams — continually inundating us with new, and new types of, information.

Information overload generally refers to how too much information can overwhelm our ability to understand an issue, and can even disable our decision making in regards to that issue (this latter aspect is generally referred to as Analysis Paralysis).

But we often forget that the term is over 40 years old.  It was popularized by Alvin Toffler in his bestselling book Future Shock, which was published in 1970, back when the Internet was still in its infancy, and long before the Internet’s progeny would give birth to the clouds contributing to the present, potentially perpetual, forecast for data precipitation.

A related term that has become big in the data management industry is Big Data, which, as Gartner Research explains, although the term acknowledges the exponential growth, availability, and use of information in today’s data-rich landscape, big data is about more than just data volume.  Data variety (i.e., structured, semi-structured, and unstructured data, as well as other types, such as the sensor data emanating from the Internet of Things) and data velocity (i.e., how fast data is being produced and how fast the data must be processed to meet demand) are also key characteristics of the big challenges of big data.

John Dodge and Bob Gourley recently discussed big data on Enterprise CIO Forum Radio, where Gourley explained that big data is essentially “the data that your enterprise is not currently able to do analysis over.”  This point resonates with a similar one made by Bill Laberis, who recently discussed new global research where half of the companies polled responded that they cannot effectively deal with analyzing the rising tide of data available to them.

Most of the big angst about big data comes from this fear that organizations are not tapping the potential business value of all that data not currently being included in their analytics and decision making.  This reminds me of psychologist Herbert Simon, who won the 1978 Nobel Prize in Economics for his pioneering research on decision making, which included comparing and contrasting the decision-making strategies of maximizing and satisficing (a term that combines satisfying with sufficing).

Simon explained that a maximizer is like a perfectionist who considers all the data they can find because they need to be assured that their decision was the best that could be made.  This creates a psychologically daunting task, especially as the amount of available data constantly increases (again, note that this observation was made over 40 years ago).  The alternative is to be a satisficer, someone who attempts to meet criteria for adequacy rather than identify an optimal solution.  And especially when time is a critical factor, such as it is with the real-time decision making demanded by a constantly changing business world.

Big data strategies will also have to compare and contrast maximizing and satisficing.  Maximizers, if driven by their angst about all that data they are not analyzing, might succumb to information overload.  Satisficers, if driven by information optimization, might sufficiently integrate just enough of big data into their business analytics in a way that satisfies specific business needs.

As big data forces us to revisit information overload, it may be useful for us to remember that originally the primary concern was not about the increasing amount of information, but instead the increasing access to information.  As Clay Shirky succinctly stated, “It’s not information overload, it’s filter failure.”  So, to harness the business value of big data, we will need better filters, which may ultimately make for the entire distinction between information overload and information optimization.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

The Data Encryption Keeper

The Cloud Security Paradox

The Good, the Bad, and the Secure

Securing your Digital Fortress

Shadow IT and the New Prometheus

Are Cloud Providers the Bounty Hunters of IT?

The Diderot Effect of New Technology

The IT Consumerization Conundrum

The IT Prime Directive of Business First Contact

A Sadie Hawkins Dance of Business Transformation

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The Partly Cloudy CIO

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

Making EIM Work for Business

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, I discuss Enterprise Information Management (EIM) with John Ladley, the author of the excellent book Making EIM Work for Business, exploring what makes information management, not just useful, but valuable to the enterprise.

John Ladley is a business technology thought leader with 30 years of experience in improving organizations through the successful implementation of information systems.  He is a recognized authority in the use and implementation of business intelligence and enterprise information management.  John Ladley frequently writes and speaks on a variety of technology and enterprise information management topics.  His information management experience is balanced between strategic technology planning, project management, and, most important, the practical application of technology to business problems.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Two Weeks Before Christmas

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Season’s Greetings fellow data management enthusiasts and welcome to a special holiday-themed episode of OCDQ Radio.

With the Christmas, Hanukkah, Kwanzaa, and Festivus seasons now upon us, I revisited my ‘Twas Two Weeks Before Christmas blog post from 2009, which is based on the poem A Visit from St. Nicholas.  During this brief podcast, I perform a recital.

The entire OCDQ Blog family wishes you and yours all the best during this holiday season and the coming new year.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

You only get a Return from something you actually Invest in

In my previous post, I took a slightly controversial stance on a popular three-word phrase — Root Cause Analysis.  In this post, it’s another popular three-word phrase — Return on Investment (most commonly abbreviated as the acronym ROI).

What is the ROI of purchasing a data quality tool or launching a data governance program?

Zero.  Zip.  Zilch.  Intet.  Ingenting.  Rien.  Nada.  Nothing.  Nichts.  Niets.  Null.  Niente.  Bupkis.

There is No Such Thing as the ROI of purchasing a data quality tool or launching a data governance program.

Before you hire “The Butcher” to eliminate me for being The Man Who Knew Too Little about ROI, please allow me to explain.

Returns only come from Investments

Although the reason that you likely purchased a data quality tool is because you have business-critical data quality problems, simply purchasing a tool is not an investment (unless you believe in Magic Beans) since the tool itself is not a solution.

You use tools to build, test, implement, and maintain solutions.  For example, I spent several hundred dollars on new power tools last year for a home improvement project.  However, I haven’t received any return on my home improvement investment for a simple reason — I still haven’t even taken most of the tools out of their packaging yet.  In other words, I barely even started my home improvement project.  It is precisely because I haven’t invested any time and effort that I haven’t seen any returns.  And it certainly isn’t going to help me (although it would help Home Depot) if I believed buying even more new tools was the answer.

Although the reason that you likely launched a data governance program is because you have complex issues involving the intersection of data, business processes, technology, and people, simply launching a data governance program is not an investment since it does not conjure the three most important letters.

Data is only an Asset if Data is a Currency

In his book UnMarketing, Scott Stratten discusses this within the context of the ROI of social media (a commonly misunderstood aspect of social media strategy), but his insight is just as applicable to any discussion of ROI.  “Think of it this way: You wouldn’t open a business bank account and ask to withdraw $5,000 before depositing anything. The banker would think you are a loony.”

Yet, as Stratten explained, people do this all the time in social media by failing to build up what is known as social currency.  “You’ve got to invest in something before withdrawing. Investing your social currency means giving your time, your knowledge, and your efforts to that channel before trying to withdraw monetary currency.”

The same logic applies perfectly to data quality and data governance, where we could say it’s the failure to build up what I will call data currency.  You’ve got to invest in data before you could ever consider data an asset to your organization.  Investing your data currency means giving your time, your knowledge, and your efforts to data quality and data governance before trying to withdraw monetary currency (i.e., before trying to calculate the ROI of a data quality tool or a data governance program).

If you actually want to get a return on your investment, then actually invest in your data.  Invest in doing the hard daily work of continuously improving your data quality and putting into practice your data governance principles, policies, and procedures.

Data is only an asset if data is a currency.  Invest in your data currency, and you will eventually get a return on your investment.

You only get a return from something you actually invest in.

Related Posts

Can Enterprise-Class Solutions Ever Deliver ROI?

Do you believe in Magic (Quadrants)?

Which came first, the Data Quality Tool or the Business Need?

What Data Quality Technology Wants

A Farscape Analogy for Data Quality

The Data Quality Wager

“Some is not a number and soon is not a time”

The Dumb and Dumber Guide to Data Quality

There is No Such Thing as a Root Cause

Root cause analysis.  Most people within the industry, myself included, often discuss the importance of determining the root cause of data governance and data quality issues.  However, the complex cause and effect relationships underlying an issue means that when an issue is encountered, often you are only seeing one of the numerous effects of its root cause (or causes).

In my post The Root! The Root! The Root Cause is on Fire!, I poked fun at those resistant to root cause analysis with the lyrics:

The Root! The Root! The Root Cause is on Fire!
We don’t want to determine why, just let the Root Cause burn.
Burn, Root Cause, Burn!

However, I think that the time is long overdue for even me to admit the truth — There is No Such Thing as a Root Cause.

Before you charge at me with torches and pitchforks for having an Abby Normal brain, please allow me to explain.

 

Defect Prevention, Mouse Traps, and Spam Filters

Some advocates of defect prevention claim that zero defects is not only a useful motivation, but also an attainable goal.  In my post The Asymptote of Data Quality, I quoted Daniel Pink’s book Drive: The Surprising Truth About What Motivates Us:

“Mastery is an asymptote.  You can approach it.  You can home in on it.  You can get really, really, really close to it.  But you can never touch it.  Mastery is impossible to realize fully.

The mastery asymptote is a source of frustration.  Why reach for something you can never fully attain?

But it’s also a source of allure.  Why not reach for it?  The joy is in the pursuit more than the realization.

In the end, mastery attracts precisely because mastery eludes.”

The mastery of defect prevention is sometimes distorted into a belief in data perfection, into a belief that we can not just build a better mousetrap, but we can build a mousetrap that could catch all the mice, or that by placing a mousetrap in our garage, which prevents mice from entering via the garage, we somehow also prevent mice from finding another way into our house.

Obviously, we can’t catch all the mice.  However, that doesn’t mean we should let the mice be like Pinky and the Brain:

Pinky: “Gee, Brain, what do you want to do tonight?”

The Brain: “The same thing we do every night, Pinky — Try to take over the world!”

My point is that defect prevention is not the same thing as defect elimination.  Defects evolve.  An excellent example of this is spam.  Even conservative estimates indicate almost 80% of all e-mail sent world-wide is spam.  A similar percentage of blog comments are spam, and spam generating bots are quite prevalent on Twitter and other micro-blogging and social networking services.  The inconvenient truth is that as we build better and better spam filters, spammers create better and better spam.

Just as mousetraps don’t eliminate mice and spam filters don’t eliminate spam, defect prevention doesn’t eliminate defects.

However, mousetraps, spam filters, and defect prevention are essential proactive best practices.

 

There are No Lines of Causation — Only Loops of Correlation

There are no root causes, only strong correlations.  And correlations are strengthened by continuous monitoring.  Believing there are root causes means believing continuous monitoring, and by extension, continuous improvement, has an end point.  I call this the defect elimination fallacy, which I parodied in song in my post Imagining the Future of Data Quality.

Knowing there are only strong correlations means knowing continuous improvement is an infinite feedback loop.  A practical example of this reality comes from data-driven decision making, where:

  1. Better Business Performance is often correlated with
  2. Better Decisions, which, in turn, are often correlated with
  3. Better Data, which is precisely why Better Decisions with Better Data is foundational to Business Success — however . . .

This does not mean that we can draw straight lines of causation between (3) and (1), (3) and (2), or (2) and (1).

Despite our preference for simplicity over complexity, if bad data was the root cause of bad decisions and/or bad business performance, every organization would never be profitable, and if good data was the root cause of good decisions and/or good business performance, every organization could always be profitable.  Even if good data was a root cause, not just a correlation, and even when data perfection is temporarily achieved, the effects would still be ephemeral because not only do defects evolve, but so does the business world.  This evolution requires an endless revolution of continuous monitoring and improvement.

Many organizations implement data quality thresholds to close the feedback loop evaluating the effectiveness of their data management and data governance, but few implement decision quality thresholds to close the feedback loop evaluating the effectiveness of their data-driven decision making.

The quality of a decision is determined by the business results it produces, not the person who made the decision, the quality of the data used to support the decision, or even the decision-making technique.  Of course, the reality is that business results are often not immediate and may sometimes be contingent upon the complex interplay of multiple decisions.

Even though evaluating decision quality only establishes a correlation, and not a causation, between the decision execution and its business results, it is still essential to continuously monitor data-driven decision making.

Although the business world will never be totally predictable, we can not turn a blind eye to the need for data-driven decision making best practices, or the reality that no best practice can eliminate the potential for poor data quality and decision quality, nor the potential for poor business results even despite better data quality and decision quality.  Central to continuous improvement is the importance of closing the feedback loops that make data-driven decisions more transparent through better monitoring, allowing the organization to learn from its decision-making mistakes, and make adjustments when necessary.

We need to connect the dots of better business performance, better decisions, and better data by drawing loops of correlation.

 

Decision-Data Feedback Loop

Continuous improvement enables better decisions with better data, which drives better business performance — as long as you never stop looping the Decision-Data Feedback Loop, and start accepting that there is no such thing as a root cause.

I discuss this, and other aspects of data-driven decision making, in my DataFlux white paper, which is available for download (registration required) using the following link: Decision-Driven Data Management

 

Related Posts

The Root! The Root! The Root Cause is on Fire!

Bayesian Data-Driven Decision Making

The Role of Data Quality Monitoring in Data Governance

The Circle of Quality

Oughtn’t you audit?

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

Imagining the Future of Data Quality

What going to the Dentist taught me about Data Quality

DQ-Tip: “There is No Such Thing as Data Accuracy...”

The HedgeFoxian Hypothesis