Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments

Entries in Philosophy (173)

Thursday
May162013

The Need for Data Philosophers

In my post On Philosophy, Science, and Data, I explained that although some argue philosophy only reigns in the absence of data while science reigns in the analysis of data, a conceptual bridge still remains between analysis and insight, the crossing of which is itself a philosophical exercise.  Therefore, I argued that an endless oscillation persists between science and philosophy, which is why, despite the fact that all we hear about is the need for data scientists, there’s also a need for data philosophers.

Of course, the debate between science and philosophy is a very old one, as is the argument we need both.  In my previous post, I slightly paraphrased Immanuel Kant (“perception without conception is blind and conception without perception is empty”) by saying that science without philosophy is blind and philosophy without science is empty.

In his book Cosmic Apprentice: Dispatches from the Edges of Science, Dorion Sagan explained that science and philosophy hang “in a kind of odd balance, watching each other, holding hands.  Science’s eye for detail, buttressed by philosophy’s broad view, makes for a kind of alembic, an antidote to both.  Although philosophy isn’t fiction, it can be more personal, creative and open, a kind of counterbalance for science even as it argues that science, with its emphasis on a kind of impersonal materialism, provides a crucial reality check for philosophy and a tendency to over-theorize that’s inimical to the scientific spirit.  Ideally, in the search for truth, science and philosophy, the impersonal and autobiographical, can keep each other honest in a kind of open circuit.”

“Science’s spirit is philosophical,” Sagan concluded.  “It is the spirit of questioning, of curiosity, of critical inquiry combined with fact-checking.  It is the spirit of being able to admit you’re wrong, of appealing to data, not authority.”

“Science,” as his father Carl Sagan said, “is a way of thinking much more than it is a body of knowledge.”  By extension, we could say that data science is about a way of thinking much more than it is about big data or about being data-driven.

I have previously blogged that science has always been about bigger questions, not bigger data.  As Claude Lévi-Strauss said, “the scientist is not a person who gives the right answers, but one who asks the right questions.”  As far as data science goes, what are the right questions?  Data scientist Melinda Thielbar proposes three key questions (Actionable? Verifiable? Repeatable?).

Here again we see the interdependence of science and philosophy.  “Philosophy,” Marilyn McCord Adams said, “is thinking really hard about the most important questions and trying to bring analytic clarity both to the questions and the answers.”

“Philosophy is critical thinking,” Don Cupitt said. “Trying to become aware of how one’s own thinking works, of all the things one takes for granted, of the way in which one’s own thinking shapes the things one’s thinking about.”  Yes, even a data scientist’s own thinking could shape the things they are thinking scientifically about.  Big data evangelist James Kobielus recently blogged about five biases that may crop up in a data scientist’s work (Cognitive, Selection, Sampling, Modeling, Funding).

“Data science has a bright future ahead,” explained Hilary Mason in a recent interview.  “There will only be more data, and more of a need for people who can find meaning and value in that data.  We’re also starting to see a greater need for data engineers, people to build infrastructure around data and algorithms, and data artists, people who can visualize the data.”

I agree with Mason, and I would add that we are also starting to see a greater need for data philosophers, people who can, borrowing the words that Anthony Kenny used to define philosophy, “think as clearly as possible about the most fundamental concepts that reach through all the disciplines.”

 

Related Posts

On Philosophy, Science, and Data

OCDQ Radio - Demystifying Data Science

OCDQ Radio - Data Quality and Big Data

Big Data and the Infinite Inbox

The Laugh-In Effect of Big Data

HoardaBytes and the Big Data Lebowski

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

The Wisdom of Crowds, Friends, and Experts

Why Data Science Storytelling Needs a Good Editor

Predictive Analytics, the Data Effect, and Jed Clampett

Bigger Questions, not Bigger Data

The Flying Monkeys of Big Data

Cargo Cult Data Science

Speed Up Your Data to Slow Down Your Decisions

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Tuesday
May072013

Keep Looking Up Insights in Data

In a previous post, I used the history of the Hubble Space Telescope to explain how data cleansing saves lives, based on a true story I read in the book Space Chronicles: Facing the Ultimate Frontier by Neil deGrasse Tyson.  In this post, Hubble and Tyson once again provide the inspiration for an insightful metaphor about data quality.

Hubble is one of dozens of space telescopes of assorted sizes and shapes orbiting the Earth.  “Each one,” Tyson explained, “provides a view of the cosmos that is unobstructed, unblemished, and undiminished by Earth’s turbulent and murky atmosphere.  They are designed to detect bands of light invisible to the human eye, some of which never penetrate Earth’s atmosphere.  Hubble is the first and only space telescope to observe the universe using primarily visible light.  Its stunningly crisp, colorful, and detailed images of the cosmos make Hubble a kind of supreme version of the human eye in space.”

This is how we’d like the quality of data to be when we’re looking for business insights.  High-quality data provides stunningly crisp, colorful, and detailed images of the business cosmos, acting as a kind of supreme version of the human eye in data.

However, despite their less-than-perfect vision, the limitations of Earth-based telescopes still facilitated significant scientific breakthroughs long before Hubble became the first space telescope in 1990.

In 1609, when the Italian physicist and astronomer Galileo Galilei turned a telescope of his own design to the sky, as Tyson explained, he “heralded a new era of technology-aided discovery, whereby the capacities of the human senses could be extended, revealing the natural world in unprecedented, even heretical ways.  The fact that Galileo revealed the Sun to have spots, the planet Jupiter to have satellites [its four moons: Callisto, Ganymede, Europa, Io], and Earth not to be the center of all celestial motion was enough to unsettle centuries of Aristotelian teachings by the Catholic Church and to put Galileo under house arrest.”

And in 1964, another Earth-based telescope, this one operated by the American astronomers Arno Penzias and Robert Wilson at AT&T Bell Labs, was responsible for what is widely considered the most important single discovery in astrophysics, what’s now known as cosmic microwave background radiation, and for which Penzias and Wilson won the 1978 Nobel Prize in Physics.

Recently, I’ve blogged about how there are times when perfect data quality is necessary, when we need the equivalent of a space telescope, and times when okay data quality is good enough, when the equivalent of an Earth-based telescope will do.

What I would like you to take away from this post is that perfect data quality is not a prerequisite for the discovery of new business insights.  Even when data doesn’t provide a perfect view of the business cosmos, even when it’s partially obstructed, blemished, or diminished by the turbulent and murky atmosphere of poor quality, data can still provide business insights.

This doesn’t mean that you should settle for poor data quality, just that you shouldn’t demand perfection before using data.

Tyson ends each episode of his StarTalk Radio program by saying “keep looking up,” so I will end this blog post by saying, even when its quality isn’t perfect, keep looking up insights in data.

 

Related Posts

Data Quality and the OK Plateau

When Poor Data Quality Kills

How Data Cleansing Saves Lives

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

The Seventh Law of Data Quality

A Tale of Two Q’s

Data Quality and The Middle Way

Stop Poor Data Quality STOP

Freudian Data Quality

Predictably Poor Data Quality

This isn’t Jeopardy

Satisficing Data Quality

Tuesday
Apr232013

The Laugh-In Effect of Big Data

Although I am an advocate for data science and big data done right, lately I have been sounding the Anti-Hype Horn with blog posts offering a contrarian’s view of unstructured data, forewarning you about the flying monkeys of big data, cautioning you against performing Cargo Cult Data Science, and inviting you to ponder the perils of the Infinite Inbox.

The hype of big data has resulted in a lot of people and vendors extolling its virtues with stories about how Internet companies, political campaigns, and new technologies have profited, or otherwise benefited, from big data.  These stories are served up as alleged business cases for investing in big data and data science.  Although some of these stories are fluff pieces, many of them accurately, and in some cases comprehensively, describe a real-world application of big data and data science.  However, these messages most often lack a critically important component — applicability to your specific business.  In Made to Stick: Why Some Ideas Survive and Others Die, Chip Heath and Dan Heath explained that “an accurate but useless idea is still useless.  If a message can’t be used to make predictions or decisions, it is without value, no matter how accurate or comprehensive it is.”

Rowan & Martin’s Laugh-In was an American sketch comedy television series, which aired from 1968 to 1973.  One of the recurring characters portrayed by Arte Johnson was Wolfgang the German soldier, who would often comment on the previous comedy sketch by saying (in a heavy and long-drawn-out German accent): “Very interesting . . . but stupid!”

From now on whenever someone shares another interesting story masquerading as a solid business case for big data that lacks any applicability beyond the specific scenario in the story, a common phenomenon I call The Laugh-In Effect of Big Data, my unapologetic response will resoundingly be: “Very interesting . . . but stupid!”

 

Related Posts

On Philosophy, Science, and Data

OCDQ Radio - Demystifying Data Science

OCDQ Radio - Data Quality and Big Data

Big Data and the Infinite Inbox

HoardaBytes and the Big Data Lebowski

Will Big Data be Blinded by Data Science?

Data Silence

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Big Data el Memorioso

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

Dot Collectors and Dot Connectors

The Wisdom of Crowds, Friends, and Experts

A Contrarian’s View of Unstructured Data

The Flying Monkeys of Big Data

Cargo Cult Data Science

A Statistically Significant Resolution for 2013

Speed Up Your Data to Slow Down Your Decisions

Rage against the Machines Learning

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

The Big Data Theory

Swimming in Big Data

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Tuesday
Apr162013

The Costs and Profits of Poor Data Quality

Continuing the theme of my two previous posts, which discussed when it’s okay to call data quality as good as it needs to get and when perfect data quality is necessary, in this post I want to briefly discuss the costs — and profits — of poor data quality.

Loraine Lawson interviewed Ted Friedman of Gartner Research about How to Measure the Cost of Data Quality Problems, such as the costs associated with reduced productivity, redundancies, business processes breaking down because of data quality issues, regulatory compliance risks, and lost business opportunities.  David Loshin blogged about the challenge of estimating the cost of poor data quality, noting that many estimates, upon close examination, seem to rely exclusively on anecdotal evidence.

A recent Mental Floss article recounted 10 Very Costly Typos, including the 1962 $80 million dollar missing hyphen in the programming code that led to the destruction of the Mariner 1 spacecraft, the 2007 Roswell, New Mexico car dealership promotion where instead of 1 out of 50,000 scratch lottery tickets revealing a $1,000 cash grand prize, all of the tickets were printed as grand-prize winners, which would have been a $50 million payout, but $250,000 in Walmart gift certificates were given out instead, and, more recently, the March 2013 typographical error in the price of pay-per-ride cards on 160,000 maps and posters that cost New York City’s Transportation Authority approximately $500,000.

Although we often only think about the costs of poor data quality, the article also shared some 2010 research performed by Harvard University claiming that Google profits an estimated $497 million dollars a year from people mistyping the names of popular websites and landing on typosquatter sites, which just happen to be conveniently littered with Google ads.

Poor data quality has also long played an important role in improving Google Search, where misspellings of search terms entered by users (and not just a spellchecker program) is leveraged by the algorithm providing the Did you mean, Including results for, and Search instead for help text displayed at the top of the first page of Google Search results.

What examples (or calculation methods) can you provide about either the costs or profits associated with poor data quality?

 

Related Posts

Promoting Poor Data Quality

Data Quality and the Cupertino Effect

The Data Quality Wager

Data Quality and the OK Plateau

When Poor Data Quality Kills

How Data Cleansing Saves Lives

The Dichotomy Paradox, Data Quality and Zero Defects

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

Data and its Relationships with Quality

The Seventh Law of Data Quality

A Tale of Two Q’s

Paleolithic Rhythm and Data Quality

Groundhog Data Quality Day

Data Quality and The Middle Way

Stop Poor Data Quality STOP

When Poor Data Quality Calls

Freudian Data Quality

Predictably Poor Data Quality

Satisficing Data Quality

Tuesday
Apr022013

Data Quality and the OK Plateau

In his book Moonwalking with Einstein: The Art and Science of Remembering, Joshua Foer explained that “when people first learn to use a keyboard, they improve very quickly from sloppy single-finger pecking to careful two-handed typing, until eventually the fingers move so effortlessly across the keys that the whole process becomes unconscious and the fingers seem to take on a mind of their own.”

“At this point,” Foer continued, “most people’s typing skills stop progressing.  They reach a plateau.  If you think about it, it’s a strange phenomenon.  After all, we’ve always been told that practice makes perfect, and many people sit behind a keyboard for at least several hours a day in essence practicing their typing.  Why don’t they just keep getting better and better?”

Foer then recounted research performed in the 1960s by the psychologists Paul Fitts and Michael Posner, which described the three stages that everyone goes through when acquiring a new skill:

  1. Cognitive — During this stage, you intellectualize the task and discover new strategies to accomplish it more proficiently.
  2. Associative — During this stage, you concentrate less, make fewer major errors, and generally become more efficient.
  3. Autonomous — During this stage, you have gotten as good as you need to get, and are basically running on autopilot.

“During that autonomous stage,” Foer explained, “you lose conscious control over what you are doing.  Most of the time that’s a good thing.  Your mind has one less thing to worry about.  In fact, the autonomous stage seems to be one of those handy features that evolution worked out for our benefit.  The less you have to focus on the repetitive tasks of everyday life, the more you can concentrate on the stuff that really matters, the stuff you haven’t seen before.  And so, once we’re just good enough at typing, we move it to the back of our mind’s filing cabinet and stop paying it any attention.”

“You can see this shift take place in fMRI scans of people learning new skills.  As a task becomes automated, parts of the brain involved in conscious reasoning become less active and other parts of the brain take over.  You could call it the OK plateau, the point at which you decide you’re OK with how good you are at something, turn on autopilot, and stop improving.”

“We all reach OK plateaus in most things we do,” Foer concluded.  “We learn how to drive when we’re in our teens and once we’re good enough to avoid tickets and major accidents, we get only incrementally better.  My father has been playing golf for forty years, and he’s still a duffer.  In four decades his handicap hasn’t fallen even a point.  Why?  He reached an OK plateau.”

I believe that data quality improvement initiatives also eventually reach an OK Plateau, a point just short of data perfection, where the diminishing returns of chasing after zero defects gives way to calling data quality as good as it needs to get.

As long as the autopilot is on, accepting data quality is a journey not a destination, preventing data quality from getting worse, and making sure best practices don’t stop being practiced, then I’m OK with data quality and the OK plateau.  Are you OK?

 

Related Posts

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

The Seventh Law of Data Quality

Data Quality and The Middle Way

Freudian Data Quality

Predictably Poor Data Quality

Satisficing Data Quality

Thursday
Mar212013

Expectation and Data Quality

One of my favorite recently read books is You Are Not So Smart by David McRaney.  Earlier this week, the book’s chapter about expectation was excerpted as an online article on Why We Can’t Tell Good Wine From Bad, which also provided additional examples about how we can be fooled by altering our expectations.

“In one Dutch study,” McRaney explained, “participants were put in a room with posters proclaiming the awesomeness of high-definition, and were told they would be watching a new high-definition program.  Afterward, the subjects said they found the sharper, more colorful television to be a superior experience to standard programming.”

No surprise there, right?  After all, a high-definition television is expected to produce a high-quality image.

“What they didn’t know,” McRaney continued, “was they were actually watching a standard-definition image.  The expectation of seeing a better quality image led them to believe they had.  Recent research shows about 18 percent of people who own high-definition televisions are still watching standard-definition programming on the set, but think they are getting a better picture.”

I couldn’t help but wonder if establishing an expectation of delivering high-quality data could lead business users to believe that, for example, the data quality of the data warehouse met or exceeded their expectations.  Could business users actually be fooled by altering their expectations about data quality?  Wouldn’t their experience of using the data eventually reveal the truth?

Retailers expertly manipulate us with presentation, price, good marketing, and great service in order to create an expectation of quality in the things we buy.  “The actual experience is less important,” McRaney explained.  “As long as it isn’t total crap, your experience will match up with your expectations.  The build up to an experience can completely change how you interpret the information reaching your brain from your otherwise objective senses.  In psychology, true objectivity is pretty much considered to be impossible.  Memories, emotions, conditioning, and all sorts of other mental flotsam taint every new experience you gain.  In addition to all this, your expectations powerfully influence the final vote in your head over what you believe to be reality.”

“Your expectations are the horse,” McRaney concluded, “and your experience is the cart.”  You might think it should be the other way around, but when your expectations determine your direction, you shouldn’t be surprised by the journey you experience.

If you find it difficult to imagine a positive expectation causing people to overlook poor quality in their experience with data, how about the opposite?  I have seen the first impression of a data warehouse initially affected by poor data quality create a negative expectation causing people to overlook the improved data quality in their subsequent experiences with the data warehouse.  Once people expect to experience poor data quality when using it, people stop trusting, and stop using, the data warehouse.

Data warehousing is only one example of how expectation can affect the data quality experience.  How are your organization’s expectations affecting its experiences with data quality?

 

Related Posts

The Data Outhouse

Data Quality and Anton’s Syndrome

Data Quality and Chicken Little Syndrome

Data Quality and Miracle Exceptions

Availability Bias and Data Quality Improvement

DQ-View: The Five Stages of Data Quality

Data Quality and the Bystander Effect

Data Quality and the Q Test

Data Myopia and Business Relativity

Plato’s Data

The Illusion-of-Quality Effect

Perception Filters and Data Quality

WYSIWYG and WYSIATI

Do the Eyes Have It?

Predictably Poor Data Quality

Freudian Data Quality

Data Psychedelicatessen

Data Geeks and Business Blindness

The Data Sharpshooter Fallacy

Data Quality and the Barber’s Paradox

Thursday
Mar142013

On Philosophy, Science, and Data

Ever since Melinda Thielbar helped me demystify data science on OCDQ Radio, I have been pondering my paraphrasing of an old idea: Science without philosophy is blind; Philosophy without science is empty; Data needs both science and philosophy.

“A philosopher’s job is to find out things about the world by thinking rather than observing,” the philosopher Bertrand Russell once said.  One could say a scientist’s job is to find out things about the world by observing and experimenting.  In fact, Russell observed that “the most essential characteristic of scientific technique is that it proceeds from experiment, not from tradition.”

Russell also said that “science is what we know, and philosophy is what we don’t know.”  However, Stuart Firestein, in his book Ignorance: How It Drives Science, explained “there is no surer way to screw up an experiment than to be certain of its outcome.”

Although it seems it would make more sense for science to be driven by what we know, by facts, “working scientists,” according to Firestein, “don’t get bogged down in the factual swamp because they don’t care that much for facts.  It’s not that they discount or ignore them, but rather that they don’t see them as an end in themselves.  They don’t stop at the facts; they begin there, right beyond the facts, where the facts run out.  Facts are selected for the questions they create, for the ignorance they point to.”

In this sense, philosophy and science work together to help us think about and experiment with what we do and don’t know.

Some might argue that while anyone can be a philosopher, being a scientist requires more rigorous training.  A commonly stated requirement in the era of big data is to hire data scientists, but this begs the question: Is data science only for data scientists?

“Clearly what we need,” Firestein explained, “is a crash course in citizen science—a way to humanize science so that it can be both appreciated and judged by an informed citizenry.  Aggregating facts is useless if you don’t have a context to interpret them.”

I would argue that clearly what organizations need is a crash course in data science—a way to humanize data science so that it can be both appreciated and judged by an informed business community.  Big data is useless if you don’t have a business context to interpret it.  Firestein also made great points about science not being exclusionary (i.e., not just for scientists).  Just as you can enjoy watching sports without being a professional athlete and you can appreciate music without being a professional musician, you can—and should—learn the basics of data science (especially statistics) without being a professional data scientist.

In order to truly deliver business value to organizations, data science can not be exclusionary.  This doesn’t mean you shouldn’t hire data scientists.  In many cases, you will need the expertise of professional data scientists.  However, you will not be able to direct them or interpret their findings without understanding the basics, what could be called the philosophy of data science.

Some might argue that philosophy only reigns in the absence of data, while science reigns in the analysis of data.  Although in the era of big data there seems to be fewer areas truly absent of data, a conceptual bridge still remains between analysis and insight, the crossing of which is itself a philosophical exercise.  So, an endless oscillation persists between science and philosophy, which is why science without philosophy is blind, and philosophy without science is empty.  Data needs both science and philosophy.

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, experimentation, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

 

Related Posts

Big Data and the Infinite Inbox

HoardaBytes and the Big Data Lebowski

Will Big Data be Blinded by Data Science?

Data Silence

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Big Data el Memorioso

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

Dot Collectors and Dot Connectors

The Wisdom of Crowds, Friends, and Experts

A Statistically Significant Resolution for 2013

Speed Up Your Data to Slow Down Your Decisions

Rage against the Machines Learning

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

The Big Data Theory

Swimming in Big Data

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Tuesday
Mar052013

Data Governance needs Searchers, not Planners

In his book Everything Is Obvious: How Common Sense Fails Us, Duncan Watts explained that “plans fail, not because planners ignore common sense, but rather because they rely on their own common sense to reason about the behavior of people who are different from them.”

As development economist William Easterly explained, “A Planner thinks he already knows the answer; A Searcher admits he doesn’t know the answers in advance.  A Planner believes outsiders know enough to impose solutions; A Searcher believes only insiders have enough knowledge to find solutions, and that most solutions must be homegrown.”

I made a similar point in my post Data Governance and the Adjacent Possible.  Change management efforts are resisted when they impose new methods by emphasizing bad business and technical processes, as well as bad data-related employee behaviors, while ignoring unheralded processes and employees whose existing methods are preventing other problems from happening.

Demonstrating that some data governance policies reflect existing best practices reduces resistance to change by showing that the search for improvement was not limited to only searching for what is currently going wrong.

This is why data governance needs Searchers, not Planners.  A Planner thinks a framework provides all the answers; A Searcher knows a data governance framework is like a jigsaw puzzle.  A Planner believes outsiders (authorized by executive management) know enough to impose data governance solutions; A Searcher believes only insiders (united by collaboration) have enough knowledge to find the ingredients for data governance solutions, and a true commitment to change always comes from within.

 

Related Posts

The Hawthorne Effect, Helter Skelter, and Data Governance

Cooks, Chefs, and Data Governance

Data Governance Frameworks are like Jigsaw Puzzles

Data Governance and the Buttered Cat Paradox

Data Governance Star Wars: Bureaucracy versus Agility

Beware the Data Governance Ides of March

Aristotle, Data Governance, and Lead Rulers

Data Governance and the Adjacent Possible

The Three Most Important Letters in Data Governance

The Data Governance Oratorio

An Unsettling Truth about Data Governance

The Godfather of Data Governance

Over the Data Governance Rainbow

Getting Your Data Governance Stuff Together

Datenvergnügen

Council Data Governance

A Tale of Two G’s

Declaration of Data Governance

The Role Of Data Quality Monitoring In Data Governance

The Collaborative Culture of Data Governance

Tuesday
Feb192013

Demystifying Data Science 

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, special guest, and actual data scientist, Dr. Melinda Thielbar, a Ph.D. Statistician, and I attempt to demystify data science by explaining what a data scientist does, including the requisite skills involved, bridging the communication gap between data scientists and business leaders, delivering data products business users can use on their own, and providing a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, experimentation, and correlation.

Melinda Thielbar is the Senior Mathematician for IAVO Research and Scientific.  Her work there focuses on power system optimization using real-time prediction models.  She has worked as a software developer, an analytic lead for big data implementations, and a statistics and programming teacher.

Melinda Thielbar is a co-founder of Research Triangle Analysts, a professional group for analysts and data scientists located in the Research Triangle of North Carolina.

While Melinda Thielbar doesn’t specialize in a single field, she is particularly interested in power systems because, as she puts it, “A power systems optimizer has to work every time.”

 

Demystifying Data Science

Additional listening options:

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

 

Related Posts

There is No Such Thing as a Root Cause

Big Data and the Infinite Inbox

HoardaBytes and the Big Data Lebowski

OCDQ Radio - Data Quality and Big Data

Will Big Data be Blinded by Data Science?

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

How Predictable Are You?

A Statistically Significant Resolution for 2013

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

The Big Data Theory

Tuesday
Feb122013

The Hawthorne Effect, Helter Skelter, and Data Governance

In his book The Half-life of Facts: Why Everything We Know Has an Expiration Date, Samuel Arbesman introduced me to the Hawthorne Effect, which is “when subjects behave differently if they know they are being studied.  The effect was named after what happened in a factory called Hawthorne Works outside Chicago in the 1920s and 1930s.”

“Scientists wished to measure,” Arbesman explained, “the effects of environmental changes on the productivity of workers.  They discovered whatever they did to change the workers’ behaviors — whether they increased the lighting or altered any other aspect of the environment — resulted in increased productivity.  However, as soon as the study was completed, productivity dropped.  The researchers concluded that the observations themselves were affecting productivity and not the experimental changes.”

I couldn’t help but wonder how the Hawthorne Effect could affect a data governance program.  When data governance policies are first defined, and their associated procedures and processes are initially implemented, after a little while, and usually after a little resistance, productivity often increases and the organization begins to advance its data governance maturity level.

Perhaps during these early stages employees are well-aware that they’re being observed to make sure they’re complying with the new data governance policies, and this observation itself accounts for advancing to the next maturity level.  Especially since after progress stops being studied so closely, it’s not uncommon for an organization to backslide to an earlier maturity level.

You might be tempted to conclude that continuous monitoring, especially of the Orwellian Big Brother variety, might be able to prevent this from happening, but I doubt it.  Data governance maturity is often misperceived in the same way that expertise is misperceived — as a static state that once achieved signifies a comforting conclusion to all the grueling effort that was required, either to become an expert, or reach a particular data governance maturity level.

However, just like the five stages of data quality, oscillating between different levels of data governance maturity, and perhaps even occasionally coming full circle, may be an inevitable part of the ongoing evolution of a data governance program, which can often feel like a top-down/bottom-up amusement park ride of the Beatles “Helter Skelter” variety:

When you get to the bottom, you go back to the top, where you stop and you turn, and you go for a ride until you get to the bottom — and then you do it again.

Come On Tell Me Your Answers

Do you, don’t you . . . think the Hawthorne Effect affects data governance?

Do you, don’t you . . . think data governance is Helter Skelter?

Tell me, tell me, come on tell me your answers — by posting a comment below.

 

Related Posts

Cooks, Chefs, and Data Governance

Data Governance Frameworks are like Jigsaw Puzzles

Data Governance and the Buttered Cat Paradox

Data Governance Star Wars: Bureaucracy versus Agility

Beware the Data Governance Ides of March

Aristotle, Data Governance, and Lead Rulers

Data Governance and the Adjacent Possible

The Three Most Important Letters in Data Governance

The Data Governance Oratorio

Jack Bauer and Enforcing Data Governance Policies

An Unsettling Truth about Data Governance

The Godfather of Data Governance

Over the Data Governance Rainbow

Getting Your Data Governance Stuff Together

Datenvergnügen

Council Data Governance

A Tale of Two G’s

Declaration of Data Governance

The Role Of Data Quality Monitoring In Data Governance

The Collaborative Culture of Data Governance

Tuesday
Feb052013

Big Data and the Infinite Inbox

Occasionally it’s necessary to temper the unchecked enthusiasm accompanying the peak of inflated expectations associated with any hype cycle.  This may be especially true for big data, and especially now since, as Svetlana Sicular of Gartner recently blogged, big data is falling into the trough of disillusionment and “to minimize the depth of the fall, companies must be at a high enough level of analytical and enterprise information management maturity combined with organizational support of innovation.”

I fear the fall may feel bottomless for those who fell hard for the hype and believe the Big Data Psychic capable of making better, if not clairvoyant, predictions.  When, in fact, “our predictions may be more prone to failure in the era of big data,” explained Nate Silver in his book The Signal and the Noise: Why Most Predictions Fail but Some Don't.  “There isn’t any more truth in the world than there was before the Internet.  Most of the data is just noise, as most of the universe is filled with empty space.”

Proposing the 3Ss (Small, Slow, Sure) as a counterpoint to the 3Vs (Volume, Velocity, Variety), Stephen Few recently blogged about the slow data movement.  “Data is growing in volume, as it always has, but only a small amount of it is useful.  Data is being generated and transmitted at an increasing velocity, but the race is not necessarily for the swift; slow and steady will win the information race.  Data is branching out in ever-greater variety, but only a few of these new choices are sure.”

Big data requires us to revisit information overload, a term that was originally about, not the increasing amount of information, but instead the increasing access to information.  As Clay Shirky stated, “It’s not information overload, it’s filter failure.”

As Silver noted, the Internet (like the printing press before it) was a watershed moment in our increased access to information, but its data deluge didn’t increase the amount of truth in the world.  And in today’s world, where many of us strive on a daily basis to prevent email filter failure and achieve what Merlin Mann called Inbox Zero, I find unfiltered enthusiasm about big data to be rather ironic, since big data is essentially enabling the data-driven decision making equivalent of the Infinite Inbox.

Imagine logging into your email every morning and discovering: You currently have () Unread Messages.

However, I’m sure most of it probably would be spam, which you obviously wouldn’t have any trouble quickly filtering (after all, infinity minus spam must be a back of the napkin calculation), allowing you to only read the truly useful messages.  Right?

 

Related Posts

HoardaBytes and the Big Data Lebowski

OCDQ Radio - Data Quality and Big Data

Open MIKE Podcast — Episode 05: Defining Big Data

Will Big Data be Blinded by Data Science?

Data Silence

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

A Statistically Significant Resolution for 2013

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

The Big Data Theory

Swimming in Big Data

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Tuesday
Jan222013

Popeye, Spinach, and Data Quality

As a kid, one of my favorite cartoons was Popeye the Sailor, who was empowered by eating spinach to take on many daunting challenges, such as battling his brawny nemesis Bluto for the affections of his love interest Olive Oyl, often kidnapped by Bluto.

I am reading the book The Half-life of Facts: Why Everything We Know Has an Expiration Date by Samuel Arbesman, who explained, while examining how a novel fact, even a wrong one, spreads and persists, that one of the strangest examples of the spread of an error is related to Popeye the Sailor.  “Popeye, with his odd accent and improbable forearms, used spinach to great effect, a sort of anti-Kryptonite.  It gave him his strength, and perhaps his distinctive speaking style.  But why did Popeye eat so much spinach?  What was the reason for his obsession with such a strange food?”

The truth begins over fifty years before the comic strip made its debut.  “Back in 1870,” Arbesman explained, “Erich von Wolf, a German chemist, examined the amount of iron within spinach, among many other green vegetables.  In recording his findings, von Wolf accidentally misplaced a decimal point when transcribing data from his notebook, changing the iron content in spinach by an order of magnitude.  While there are actually only 3.5 milligrams of iron in a 100-gram serving of spinach, the accepted fact became 35 milligrams.  Once this incorrect number was printed, spinach’s nutritional value became legendary.  So when Popeye was created, studio executives recommended he eat spinach for his strength, due to its vaunted health properties, and apparently Popeye helped increase American consumption of spinach by a third!”

“This error was eventually corrected in 1937,” Arbesman continued, “when someone rechecked the numbers.  But the damage had been done.  It spread and spread, and only recently has gone by the wayside, no doubt helped by Popeye’s relative obscurity today.  But the error was so widespread, that the British Medical Journal published an article discussing this spinach incident in 1981, trying its best to finally debunk the issue.”

“Ultimately, the reason these errors spread,” Arbesman concluded, “is because it’s a lot easier to spread the first thing you find, or the fact that sounds correct, than to delve deeply into the literature in search of the correct fact.”

What “spinach” has your organization been falsely consuming because of a data quality issue that was not immediately obvious, and which may have led to a long, and perhaps ongoing, history of data-driven decisions based on poor quality data?

Popeye said “I yam what I yam!”  Your organization yams what your data yams, so you had better make damn sure it’s correct.

 

Related Posts

The Family Circus and Data Quality

Can Data Quality avoid the Dustbin of History?

Retroactive Data Quality

Spartan Data Quality

Pirates of the Computer: The Curse of the Poor Data Quality

The Tooth Fairy of Data Quality

The Dumb and Dumber Guide to Data Quality

Darth Data

Occurred, a data defect has . . .

The Data Quality Placebo

Data Quality is People!

DQ-View: The Five Stages of Data Quality

DQ-BE: Data Quality Airlines

Wednesday Word: Quality-ish

The Five Worst Elevator Pitches for Data Quality

Shining a Social Light on Data Quality

The Poor Data Quality Jar

Data Quality and #FollowFriday the 13th

Dilbert, Data Quality, Rabbits, and #FollowFriday

Data Love Song Mashup

Tuesday
Jan082013

Data Quality and Anton’s Syndrome

In his book Incognito: The Secret Lives of the Brain, David Eagleman discussed aspects of a bizarre, and rare, brain disorder called Anton’s Syndrome in which a stroke renders a person blind — but the person denies their blindness.

“Those with Anton’s Syndrome truly believe they are not blind,” Eagleman explained.  “It is only after bumping into enough furniture and walls that they begin to feel that something is amiss.  They are experiencing what they take to be vision, but it is all internally generated.  The external data is not getting to the right places because of the stroke, and so their reality is simply that which is generated by the brain, with little attachment to the real world.  In this sense, what they experience is no different from dreaming, drug trips, or hallucinations.”

Data quality practitioners often complain that business leaders are blind to the importance of data quality to business success, or that they deny data quality issues exist in their organization.  As much as we wish it wasn’t so, often it isn’t until business leaders bump into enough of the negative effects of poor data quality that they begin to feel that something is amiss.  However, as much as we would like to, we can’t really attribute their denial to drug-induced hallucinations.

Sometimes an illusion-of-quality effect is caused when data is excessively filtered and cleansed before it reaches business leaders, perhaps as the result of a perception filter for data quality issues created as a natural self-defense mechanism by the people responsible for the business processes and technology surrounding data, since no one wants to be blamed for causing, or failing to fix, data quality issues.  Unfortunately, this might really leave the organization’s data with little attachment to the real world.

In fairness, sometimes it’s also the blind leading the blind because data quality practitioners often suffer from business blindness by presenting data quality issues without providing business context, without relating data quality metrics in a tangible manner to how the business uses data to support a business process, accomplish a business objective, or make a business decision.

A lot of the disconnect between business leaders, who believe they are not blind to data quality, and data quality practitioners, who believe they are not blind to business context, comes from a crisis of perception.  Each side in this debate believes they have a complete vision, but it’s only after bumping into each other enough times that they begin to envision the organizational blindness caused when data quality is not properly measured within a business context and continually monitored.

 

Related Posts

Data Quality and Chicken Little Syndrome

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

Availability Bias and Data Quality Improvement

Finding Data Quality

“Some is not a number and soon is not a time”

The Data Quality of Dorian Gray

The Data Quality Wager

DQ-View: The Five Stages of Data Quality

Data Quality and the Bystander Effect

Data Quality and the Q Test

Why isn’t our data quality worse?

The Illusion-of-Quality Effect

Perception Filters and Data Quality

WYSIWYG and WYSIATI

Predictably Poor Data Quality

Data Psychedelicatessen

Data Geeks and Business Blindness

The Real Data Value is Business Insight

Is your data accurate, but useless to your business?

Data Quality Measurement Matters

Data Myopia and Business Relativity

Data and its Relationships with Quality

Plato’s Data

Thursday
Jan032013

Best OCDQ Blog Posts of 2012

Welcome to my roundup of the best blog posts published on the Obsessive-Compulsive Data Quality (OCDQ) blog during 2012.

My selections were based on a pseudo-scientific, quasi-statistical combination of page views, comments, and re-tweets, as well as choosing a few of my personal favorites, and which I have organized into four sections of ten best posts by topic or type.

 

Ten Best Posts on Big Data

  • Dot Collectors and Dot Connectors — The multifaceted challenges of big data require the dot collectors of data management and the dot connectors of business intelligence to overcome their attention blindness and work together more collaboratively.
  • HoardaBytes and the Big Data Lebowski — Don’t hoard Data, dude.  The Data must abide.  The Data must abide both the Business, by proving useful to our business activities, and the Individual, by protecting the privacy of our personal activities.
  • Our Increasingly Data-Constructed World — What we now call Big Data is in fact a long-running macro trend underlying the many recent trends and innovations making our world, not just more data-driven, but increasingly data-constructed.
  • Will Big Data be Blinded by Data Science? — With apologies to Thomas Dolby, will the business leaders being told to hire data scientists to derive business value from big data analytics be blind to what data science tries to show them?
  • The Graystone Effects of Big Data — Using a metaphor based on the science fiction television show Caprica, I refer to the positive aspects of Big Data as the Zoe Graystone Effect, and the negative aspects of Big Data as the Daniel Graystone Effect.
  • Exercise Better Data Management — Big Data may be followed by MOData (i.e., MOre Data or Morbidly Obese Data), but that doesn’t necessarily mean we require more data management, instead we just need to exercise better data management.
  • A Tale of Two Datas — Inspired by Malcolm Chisholm and Charles Dickens, there are two types of data (i.e., representation and observation, not big and not-so-big) with different data uses that will require different data management approaches.
  • Data Silence — Not only do we need to adopt a mindset that embraces the principles of data science, but we also have to acknowledge that the biases and preconceptions in our minds could silence the signal and amplify the noise in big data.
  • The Wisdom of Crowds, Friends, and Experts — The future of wisdom will increasingly become an amalgamation of experts, friends, and crowds, with the data and techniques from all three sources often contributing to data-driven decision making.

 

Ten Best Posts on Data Governance and Data Quality

  • Data Quality: Quo Vadimus? — With lots of help from Henrik Liliendahl Sørensen, Garry Ure, Bryan Larkin, and many others via the comments, I ponder where data quality is going, and whether data quality is a journey or a destination.
  • Data Quality and Miracle Exceptions — Battling the dark forces of poor data quality doesn’t require any superpowers, and data quality doesn’t have any miracle exceptions, so for the love of high-quality data everywhere, stop trying to sell us one.
  • Data Myopia and Business Relativity — Examines the two most prevalent definitions for data quality, real-world alignment and fitness for the purpose of use, otherwise known as the danger of data myopia and the challenge of business relativity.
  • How Data Cleansing Saves Lives — Although proactive defect prevention is far superior to reactive data cleansing, the history of the Hubble Space Telescope proves that data cleansing can be not just a necessary evil, but also a necessary good.
  • Data Quality and the Bystander Effect — The most common reason data quality issues are neither reported nor corrected is the Bystander Effect making people less likely to interpret bad data as a problem or, at the very least, not their responsibility.
  • Data Quality and Chicken Little Syndrome — A chicken-metaphor-based post about the far-too-common and fowl folly of, instead of trying to sell the business benefits of data quality, emphasizing the negative aspects of not investing in data quality.
  • Data and its Relationships with Quality — The metadata linking the data management industry to what it manages suffers from the one-to-many relationships created by never agreeing on how data, information, and quality should be defined.
  • Cooks, Chefs, and Data Governance — Implementing policies requires cooks who are adept at carrying out a recipe, as well as chefs who are trusted to figure out how to best combine policies with the organizational ingredients available to them.
  • Availability Bias and Data Quality Improvement — The availability heuristic explains why a reactive data cleansing project is easily approved, and availability bias explains why initiating a proactive data quality program is usually resisted.

 

Ten Best Podcasts

  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Saving Private Data — Recorded in December 2011, guest Daragh O Brien discusses the data privacy and data protection implications of social media, cloud computing, and big data.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Defining Big Data — This episode of the Open MIKE Podcast, with assistance from Robert Hillard, discusses how big data refers to big complexity, not big volume, even though complex datasets tend to grow rapidly, thus making them voluminous.
  • Getting to Know NoSQL — This episode of the Open MIKE Podcast discusses how NoSQL does not mean AntiSQL (i.e., NoSQL is not a Relational replacement), and that business-driven big data needs will often require “Not Only SQL.”

 

Ten Best of the Rest

  • DQ-View: Data Is as Data Does — In this short video, I explain that data’s value comes from data’s usefulness, exemplifying the potential value of unstructured data based on whether or not you put what you read in data management books to use.
  • DQ-View: The Five Stages of Data Quality — In this short video, using my superb acting skills, I demonstrate how coming to terms with the daunting challenge of data quality is somewhat similar to experiencing the Five Stages of Grief.
  • DQ-View: MetaData makes BettahMusic — In this short video, I demonstrate how better metadata makes data better using the metadata automatically and manually created after importing my CD collection into my iTunes library.
  • Metadata, Data Quality, and the Stroop Test — In this colorful (and perhaps too colorful) post, I use the Stroop Test, where colors do not match their names, to discuss the relationship between metadata and data quality.
  • Quality is the Higgs Field of Data — Using one of the biggest science stories of 2012, the potential discovery of the elusive Higgs Boson (which I also attempt to explain), I attempt an analogy for data quality based on the Higgs Field.
  • The Family Circus and Data Quality — Thanks to The Family Circus comic strip created by cartoonist Bil Keane, I explain how Ida Know owns the data, Not Me is accountable for data governance, and Nobody takes responsibility for data quality.
  • Data Love Song Mashup — Since your data needs love too, on Valentine’s Day I wrote this post providing a mashup of love songs for your data (and Rob DuMoulin added a few more in the comments) — Happy Data Quality to you and your data!
  • The Algebra of Collaboration — The trick of algebra equates collaboration with data quality and data governance success when collaboration is viewed not just as a guiding principle, but also as a call to action in your daily practices.
  • The Return of the Dumb Terminal — With help from author Kevin Kelly and my old green machine, I ponder how the mobile-app-portal-to-the-cloud computing model means mobile devices are bringing about the return of the dumb terminal.
  • An Enterprise Carol — Jacob Marley raises the ghosts of a few ideas to consider about how to keep the Enterprise well in the new year via the Ghosts of Enterprise Past (Legacy Applications), Present (IT Consumerization), and Future (Big Data).

 

Thank You for Reading OCDQ Blog in 2012

In 2012, the Obsessive-Compulsive Data Quality (OCDQ) blog published 92 posts, which received 160,000 total page views, while averaging over 400 page views and 200 unique visitors a day.

Thank you for reading OCDQ Blog in 2012.  Your readership was deeply appreciated.

 

Related Posts

Best OCDQ Blog Posts of 2011

So Long 2011, and Thanks for All the . . . – The OCDQ Radio 2011 Year in Review

2012 Quarterly Review of the Data Roundtable (Part 4)

2012 Quarterly Review of the Data Roundtable (Part 3)

2012 Quarterly Review of the Data Roundtable (Part 2)

2012 Quarterly Review of the Data Roundtable (Part 1)

2011 Quarterly Review of the Data Roundtable (Part 4)

2011 Quarterly Review of the Data Roundtable (Part 3)

2011 Quarterly Review of the Data Roundtable (Part 2)

2011 Quarterly Review of the Data Roundtable (Part 1)

Tuesday
Dec042012

The Wisdom of Crowds, Friends, and Experts

I recently finished reading the TED Book by Jim Hornthal, A Haystack Full of Needles, which included an overview of the different predictive approaches taken by one of the most common forms of data-driven decision making in the era of big data, namely, the recommendation engines increasingly provided by websites, social networks, and mobile apps.

These recommendation engines primarily employ one of three techniques, choosing to base their data-driven recommendations on the “wisdom” provided by either crowds, friends, or experts.

 

The Wisdom of Crowds

In his book The Wisdom of Crowds, James Surowiecki explained that the four conditions characterizing wise crowds are diversity of opinion, independent thinking, decentralization, and aggregation.  Amazon is a great example of a recommendation engine using this approach by assuming that a sufficiently large population of buyers is a good proxy for your purchasing decisions.

For example, Amazon tells you that people who bought James Surowiecki’s bestselling book also bought Thinking, Fast and Slow by Daniel Kahneman, Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business by Jeff Howe, and Wikinomics: How Mass Collaboration Changes Everything by Don Tapscott.  However, Amazon neither provides nor possesses knowledge of why people bought all four of these books or qualification of the subject matter expertise of these readers.

However, these concerns, which we could think of as potential data quality issues, and which would be exacerbated within a small amount of transaction data where the eclectic tastes and idiosyncrasies of individual readers would not help us decide what books to buy, within a large amount of transaction data, we achieve the Wisdom of Crowds effect when, taken in aggregate, we receive a general sense of what books we might like to read based on what a diverse group of readers collectively makes popular.

As I blogged about in my post Sometimes it’s Okay to be Shallow, sometimes the aggregated, general sentiment of a large group of unknown, unqualified strangers will be sufficient to effectively make certain decisions.

 

The Wisdom of Friends

Although the influence of our friends and family is the oldest form of data-driven decision making, historically this influence was delivered by word of mouth, which required you to either be there to hear those influential words when they were spoken, or have a large enough network of people you knew that would be able to eventually pass along those words to you.

But the rise of social networking services, such as Twitter and Facebook, has transformed word of mouth into word of data by transcribing our words into short bursts of social data, such as status updates, online reviews, and blog posts.

Facebook “Likes” are a great example of a recommendation engine that uses the Wisdom of Friends, where our decision to buy a book, see a movie, or listen to a song might be based on whether or not our friends like it.  Of course, “friends” is used in a very loose sense in a social network, and not just on Facebook, since it combines strong connections such as actual friends and family, with weak connections such as acquaintances, friends of friends, and total strangers from the periphery of our social network.

Social influence has never ended with the people we know well, as Nicholas Christakis and James Fowler explained in their book Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives.  But the hyper-connected world enabled by the Internet, and further facilitated by mobile devices, has strengthened the social influence of weak connections, and these friends form a smaller crowd whose wisdom is involved in more of our decisions than we may even be aware of.

 

The Wisdom of Experts

Since it’s more common to associate wisdom with expertise, Pandora is a great example of a recommendation engine that uses the Wisdom of Experts.  Pandora used a team of musicologists (professional musicians and scholars with advanced degrees in music theory) to deconstruct more than 800,000 songs into 450 musical elements that make up each performance, including qualities of melody, harmony, rhythm, form, composition, and lyrics, as part of what Pandora calls the Music Genome Project.

As Pandora explains, their methodology uses precisely defined terminology, a consistent frame of reference, redundant analysis, and ongoing quality control to ensure that data integrity remains reliably high, believing that delivering a great radio experience to each and every listener requires an incredibly broad and deep understanding of music.

Essentially, experts form the smallest crowd of wisdom.  Of course, experts are not always right.  At the very least, experts are not right about every one of their predictions.  Nor do experts always agree with other, which is why I imagine that one of the most challenging aspects of the Music Genome Project is getting music experts to consistently apply precisely the same methodology.

Pandora also acknowledges that each individual has a unique relationship with music (i.e., no one else has tastes exactly like yours), and allows you to “Thumbs Up” or “Thumbs Down” songs without affecting other users, producing more personalized results than either the popularity predicted by the Wisdom of Crowds or the similarity predicted by the Wisdom of Friends.

 

The Future of Wisdom

It’s interesting to note that the Wisdom of Experts is the only one of these approaches that relies on what data management and business intelligence professionals would consider a rigorous approach to data quality and decision quality best practices.  But this is also why the Wisdom of Experts is the most time-consuming and expensive approach to data-driven decision making.

In the past, the Wisdom of Crowds and Friends was ignored in data-driven decision making for the simple reason that this potential wisdom wasn’t digitized.  But now, in the era of big data, not only are crowds and friends digitized, but technological advancements combined with cost-effective options via open source (data and software) and cloud computing make these approaches quicker and cheaper than the Wisdom of Experts.  And despite the potential data quality and decision quality issues, the Wisdom of Crowds and/or Friends is proving itself a viable option for more categories of data-driven decision making.

I predict that the future of wisdom will increasingly become an amalgamation of experts, friends, and crowds, with the data and techniques from all three potential sources of wisdom often acknowledged as contributors to data-driven decision making.

 

Related Posts

Sometimes it’s Okay to be Shallow

Word of Mouth has become Word of Data

The Wisdom of the Social Media Crowd

Data Management: The Next Generation

Exercise Better Data Management

Darth Vader, Big Data, and Predictive Analytics

Data-Driven Intuition

The Big Data Theory

Finding a Needle in a Needle Stack

Big Data, Predictive Analytics, and the Ideal Chronicler

The Limitations of Historical Analysis

Magic Elephants, Data Psychics, and Invisible Gorillas

OCDQ Radio - Data Quality and Big Data

Big Data: Structure and Quality

HoardaBytes and the Big Data Lebowski

The Data-Decision Symphony

OCDQ Radio - Decision Management Systems

A Tale of Two Datas