Data Quality in Six Verbs

Once upon a time when asked on Twitter to identify a list of critical topics for data quality practitioners, my pithy (with only 140 characters in a tweet, pithy is as good as it gets) response was, and especially since I prefer emphasizing the need to take action, to propose six critical verbs: Investigate, Communicate, Collaborate, Remediate, Inebriate, and Reiterate.

Lest my pith be misunderstood aplenty, this blog post provides more detail, plus links to related posts, about what I meant.

1 — Investigate

Data quality is not exactly a riddle wrapped in a mystery inside an enigma.  However, understanding your data is essential to using it effectively and improving its quality.  Therefore, the first thing you must do is investigate.

So, grab your favorite (preferably highly caffeinated) beverage, get settled into your comfy chair, roll up your sleeves and starting analyzing that data.  Data profiling tools can be very helpful with raw data analysis.

However, data profiling is elementary, my dear reader.  In order for you to make sense of those data elements, you require business context.  This means you must also go talk with data’s best friends—its stewards, analysts, and subject matter experts.

Six blog posts related to Investigate:

2 — Communicate

After you have completed your preliminary investigation, the next thing you must do is communicate your findings, which helps improve everyone’s understanding of how data is being used, verify data’s business relevancy, and prioritize critical issues.

Keep in mind that communication is mostly about listening.  Also, be prepared to face “data denial” whenever data quality is discussed.  This is a natural self-defense mechanism for the people responsible for business processes, technology, and data, which is understandable because nobody likes to be blamed (or feel blamed) for causing or failing to fix data quality problems.

No matter how uncomfortable these discussions may be at times, they are essential to evaluating the potential ROI of data quality improvements, defining data quality standards, and most importantly, providing a working definition of success.

Six blog posts related to Communicate:

3 — Collaborate

After you have investigated and communicated, now you must rally the team that will work together to improve the quality of your data.  A cross-disciplinary team will be needed because data quality is neither a business nor a technical issue—it is both.

Therefore, you will need the collaborative effort of business and technical folks.  The business folks usually own the data, or at least the business processes that create it, so they understand its meaning and daily use.  The technical folks usually own the hardware and software comprising your data architecture.  Both sets of folks must realize they are all “one company folk” that must collaborate in order to be successful.

No, you don’t need a folk singer, but you may need an executive sponsor.  The need for collaboration might sound rather simple, but as one of my favorite folk singers taught me, sometimes the hardest thing to learn is the least complicated.

Six blog posts related to Collaborate:

4 — Remediate

Resolving data quality issues requires a combination of data cleansing and defect prevention.  Data cleansing is reactive and its common (and deserved) criticism is that it essentially treats the symptoms without curing the disease. 

Defect prevention is proactive and through root cause analysis and process improvements, it essentially is the cure for the quality ills that ail your data.  However, a data governance framework is often necessary for defect prevention to be successful.  As is patience and understanding since it will require a strategic organizational transformation that doesn’t happen overnight.

The unavoidable reality is that data cleansing is used to correct today’s problems while defect prevention is busy building a better tomorrow for your organization.  Fundamentally, data quality requires a hybrid discipline that combines data cleansing and defect prevention into an enterprise-wide best practice.

Six blog posts related to Remediate:

5 — Inebriate

I am not necessarily advocating that kind of inebriation.  Instead, think Emily Dickinson (i.e., “Inebriate of air am I” – it’s a line from a poem about happiness that, yes, also happens to make a good drinking song). 

My point is that you must not only celebrate your successes, but celebrate them quite publicly.  Channel yet another poet (Walt Whitman) and sound your barbaric yawp over the cubicles of your company: “We just improved the quality of our data!”

Of course, you will need to be more specific.  Declare success using words illustrating the business impact of your achievements, such as mitigated risks, reduced costs, or increased revenues — those three are always guaranteed executive crowd pleasers.

Six blog posts related to Inebriate:

6 — Reiterate

Like the legend of the phoenix, the end is also a new beginning.  Therefore, don’t get too inebriated, since you are not celebrating the end of your efforts.  Your data quality journey has only just begun.  Your continuous monitoring must continue and your ongoing improvements must remain ongoing.  Which is why, despite the tension this reality, and this bad grammatical pun, might cause you, always remember that the tense of all six of these verbs is future continuous.

Six blog posts related to Reiterate:

What Say You?

Please let me know what you think, pithy or otherwise, by posting a comment below.  And feel free to use more than six verbs.

Council Data Governance

Inspired by the great Eagles song Hotel California, this DQ-Song “sings” about the common mistake of convening a council too early when starting a new data governance program.  Now, of course, data governance is a very important and serious subject, which is why some people might question whether or not music is the best way to discuss data governance.

Although I understand that skepticism, I can’t help but recall the words of Frank Zappa:

“Information is not knowledge;

Knowledge is not wisdom;

Wisdom is not truth;

Truth is not beauty;

Beauty is not love;

Love is not music;

Music is the best.”

Council Data Governance

Down a dark deserted hallway, I walked with despair
As the warm smell of bagels rose up through the air
Up ahead in the distance, I saw a shimmering light
My head grew heavy and my sight grew dim
I had to attend another data governance council meeting
As I stood in the doorway
I heard the clang of the meeting bell

And I was thinking to myself
This couldn’t be heaven, but this could be hell
As stakeholders argued about the data governance way
There were voices down the corridor
I thought I heard them say . . .

Welcome to the Council Data Governance
Such a dreadful place (such a dreadful place)
Time crawls along at such a dreadful pace
Plenty of arguing at the Council Data Governance
Any time of year (any time of year)
You can hear stakeholders arguing there

Their agendas are totally twisted, with means to their own end
They use lots of pretty, pretty words, which I don’t comprehend
How they dance around the complex issues with sweet sounding threats
Some speak softly with remorse, some speak loudly without regrets

So I cried out to the stakeholders
Can we please reach consensus on the need for collaboration?
They said, we haven’t had that spirit here since nineteen ninety nine
And still those voices they’re calling from far away
Wake you up in the middle of this endless meeting
Just to hear them say . . .

Welcome to the Council Data Governance
Such a dreadful place (such a dreadful place)
Time crawls along at such a dreadful pace
They argue about everything at the Council Data Governance
And it’s no surprise (it’s no surprise)
To hear defending the status quo alibis

Bars on all of the windows
Rambling arguments, anything but concise
We are all just prisoners here
Of our own device
In the data governance council chambers
The bickering will never cease
They stab it with their steely knives
But they just can’t kill the beast

Last thing I remember, I was
Running for the door
I had to find the passage back
To the place I was before
Relax, said the stakeholders
We have been programmed by bureaucracy to believe
You can leave the council meeting any time you like
But success with data governance, you will never achieve!

 

More Data Quality Songs

Data Love Song Mashup

I’m Gonna Data Profile (500 Records)

A Record Named Duplicate

New Time Human Business

You Can’t Always Get the Data You Want

I’m Bringing DQ Sexy Back

Imagining the Future of Data Quality

The Very Model of a Modern DQ General

More Data Governance Posts

Beware the Data Governance Ides of March

Data Governance Star Wars: Bureaucracy versus Agility

Aristotle, Data Governance, and Lead Rulers

Data Governance needs Searchers, not Planners

Data Governance Frameworks are like Jigsaw Puzzles

Is DG a D-O-G?

The Hawthorne Effect, Helter Skelter, and Data Governance

Data Governance and the Buttered Cat Paradox

Total Information Risk Management

OCDQ Radio is an audio podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, I am joined by special guest Dr. Alexander Borek, the inventor of Total Information Risk Management (TIRM) and the leading expert on how to apply risk management principles to data management.  Dr. Borek is a frequent speaker at international information management conferences and author of many research articles covering a range of topics, including EIM, data quality, crowd sourcing, and IT business value.  In his current role at IBM, Dr. Borek applies data analytics to drive IBM’s worldwide corporate strategy.  Previously, he led a team at the University of Cambridge to develop the TIRM process and test it in a number of different industries.  He holds a PhD in engineering from the University of Cambridge.

This podcast discusses his book Total Information Risk Management: Maximizing the Value of Data and Information Assets, which is now available world-wide and is a must read for all data and information managers who want to understand and measure the implications of low quality data and information assets.  The book provides step by step instructions, along with illustrative examples from studies in many different industries, on how to implement total information risk management, which will help your organization:

  • Learn how to manage data and information for business value.
  • Create powerful and convincing business cases for all your data and information management, data governance, big data, data warehousing, business intelligence, and business analytics initiatives, projects, and programs.
  • Protect your organization from risks that arise through poor data and information assets.
  • Quantify the impact of having poor data and information.

 

Additional Listening Options:

 

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Data Profiling Early and Often — Guest James Standen discusses data profiling concepts and practices, and how bad data is often misunderstood and can be coaxed away from the dark side if you know how to approach it.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Is DG a D-O-G?

Is DG a DOG.png

Convincing your organization to invest in a sustained data quality program implemented within a data governance framework can be a very difficult task requiring an advocate with a championship pedigree.  But sometimes it seems like no matter how persuasive your sales pitch is, even when your presentation is judged best in show, it appears to fall on deaf ears.

Perhaps, data governance (DG) is a D-O-G.  In other words, maybe the DG message is similar to a sound only dogs can hear.

Galton’s Whistle

In the late 19th century, Francis Galton developed a whistle (now more commonly called a dog whistle), which he used to test the range of frequencies that could be heard by various animals.  Galton was conducting experiments on human faculties, including the range of human hearing.  Although not its intended purpose, today Galton’s whistle is used by dog trainers.  By varying the frequency of the whistle, it emits a sound (inaudible to humans) used either to simply get a dog’s attention, or alternatively to inflict pain for the purpose of correcting undesirable behavior.

Bad Data, Bad, Bad Data!

Many organizations do not become aware of the importance of data governance until poor data quality repeatedly “bites” critical business decisions.  Typically following a very nasty bite, executives scream “bad data, bad, bad data!” without stopping to realize the enterprise’s poor data management practices unleashed the perpetually bad data now running amuck within their systems.

For these organizations, advocacy of proactive defect prevention was an inaudible sound, and now the executives blow harshly into their data whistle and demand a one-time data cleansing project to correct the current data quality problems.

However, even after the project is over, it’s often still a doggone crazy data world.

The Data Whisperer

Executing disconnected one-off projects to deal with data issues when they become too big to ignore doesn’t work because it doesn’t identify and correct the root causes of data’s bad behavior.  By advocating root cause analysis and business process improvement, data governance can essentially be understood as The Data Whisperer.

Data governance defines policies and procedures for aligning data usage with business metrics, establishes data stewardship, prioritizes data quality issues, and facilitates collaboration among all of the business and technical stakeholders.

Data governance enables enterprise-wide data quality by combining data cleansing (which will still occasionally be necessary) and defect prevention into a hybrid discipline, which will result in you hearing everyday tales about data so well behaved that even your executives’ tails will be wagging.

Data’s Best Friend

Without question, data governance is very disruptive to an organization’s status quo.  It requires patience, understanding, and dedication because it will require a strategic enterprise-wide transformation that doesn’t happen overnight.

However, data governance is also data’s best friend. 

And in order for your organization to be successful, you have to realize that data is also your best friend.  Data governance will help you take good care of your data, which in turn will take good care of your business.

Basically, the success of your organization comes down to a very simple question — Are you a DG person?

Related Posts

The Three Most Important Letters in Data Governance

Data Governance Frameworks are like Jigsaw Puzzles

Data Governance needs Searchers, not Planners

The Hawthorne Effect, Helter Skelter, and Data Governance

Cooks, Chefs, and Data Governance

Data Governance and the Buttered Cat Paradox

Data Governance Star Wars: Bureaucracy versus Agility

Beware the Data Governance Ides of March

Aristotle, Data Governance, and Lead Rulers

Data Governance and the Adjacent Possible

OCDQ Radio - Doing Data Governance

OCDQ Radio - The Data Governance Imperative

The Second Law of Data Quality

Data Governance Trek

What’s the Over-Under on Communication?

Data Governance needs a Gravity Assist

The Pull and Push of Data Governance

There’s only One Right Way to Do Data Governance

An Unsettling Truth about Data Governance

The Sixth Law of Data Quality

The Role of Data Quality Monitoring in Data Governance

When All is Null on the Metrics Front

The Collaborative Culture of Data Governance

The Seventh Law of Data Quality

Tweet

The Stone Wars of Root Cause Analysis

“As a single stone causes concentric ripples in a pond,” Martin Doyle commented on my blog post There is No Such Thing as a Root Cause, “there will always be one root cause event creating the data quality wave.  There may be interference after the root cause event which may look like a root cause, creating eddies of side effects and confusion, but I believe there will always be one root cause.  Work backwards from the data quality side effects to the root cause and the data quality ripples will be eliminated.”

Martin Doyle and I continued our congenial blog comment banter on my podcast episode The Johari Window of Data Quality, but in this blog post I wanted to focus on the stone-throwing metaphor for root cause analysis.

Let’s begin with the concept of a single stone causing the concentric ripples in a pond.  Is the stone really the root cause?  Who threw the stone?  Why did that particular person choose to throw that specific stone?  How did the stone come to be alongside the pond?  Which path did the stone-thrower take to get to the pond?  What happened to the stone-thrower earlier in the day that made them want to go to the pond, and once there, pick up a stone and throw it in the pond?

My point is that while root cause analysis is important to data quality improvement, too often we can get carried away riding the ripples of what we believe to be the root cause of poor data quality.  Adding to the complexity is the fact there’s hardly ever just one stone.  Many stones get thrown into our data ponds, and trying to un-ripple their poor quality effects can lead us to false conclusions because causation is non-linear in nature.  Causation is a complex network of many interrelated causes and effects, so some of what appear to be the effects of the root cause you have isolated may, in fact, be the effects of other causes.

As Laura Sebastian-Coleman explains, data quality assessments are often “a quest to find a single criminal—The Root Cause—rather than to understand the process that creates the data and the factors that contribute to data issues and discrepancies.”  Those approaching data quality this way, “start hunting for the one thing that will explain all the problems.  Their goal is to slay the root cause and live happily ever after.  Their intentions are good.  And slaying root causes—such as poor process design—can bring about improvement.  But many data problems are symptoms of a lack of knowledge about the data and the processes that create it.  You cannot slay a lack of knowledge.  The only way to solve a knowledge problem is to build knowledge of the data.”

Believing that you have found and eliminated the root cause of all your data quality problems is like believing that after you have removed the stones from your pond (i.e., data cleansing), you can stop the stone-throwers by building a high stone-deflecting wall around your pond (i.e., defect prevention).  However, there will always be stones (i.e., data quality issues) and there will always be stone-throwers (i.e., people and processes) that will find a way to throw a stone in your pond.

In our recent podcast Measuring Data Quality for Ongoing Improvement, Laura Sebastian-Coleman and I discussed although root cause is used as a singular noun, just as data is used as a singular noun, we should talk about root causes since, just as data analysis is not analysis of a single datum, root cause analysis should not be viewed as analysis of a single root cause.

The bottom line, or, if you prefer, the ripple at the bottom of the pond, is the Stone Wars of Root Cause Analysis will never end because data quality is a journey, not a destination.  After all, that’s why it’s called ongoing data quality improvement.

 

Related Posts

There is No Such Thing as a Root Cause

DQ-View: Occam’s Razor Burn

Data Quality: Quo Vadimus?

Finding Data Quality

Why isn’t our data quality worse?

Data and Process Transparency

Days Without A Data Quality Issue

The Dichotomy Paradox, Data Quality and Zero Defects

Data and its Relationships with Quality

The Role of Data Quality Monitoring in Data Governance

Data Quality and Miracle Exceptions

Data Myopia and Business Relativity

Expectation and Data Quality

Is your data accurate, but useless to your business?

DQ-Tip: “Data quality is primarily about context...”

DQ-Tip: “There is no such thing as data accuracy...”

Data Quality and the Cupertino Effect

You Need to Know Your Data Quality Reference Points

Why You Need Data Quality Standards

Adventures in Data Profiling

Measuring Data Quality for Ongoing Improvement

OCDQ Radio is an audio podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Listen to Laura Sebastian-Coleman, author of the book Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework, and I discuss bringing together a better understanding of what is represented in data, and how it is represented, with the expectations for use in order to improve the overall quality of data.  Our discussion also includes avoiding two common mistakes made when starting a data quality project, and defining five dimensions of data quality.

Laura Sebastian-Coleman has worked on data quality in large health care data warehouses since 2003.  She has implemented data quality metrics and reporting, launched and facilitated a data quality community, contributed to data consumer training programs, and has led efforts to establish data standards and to manage metadata.  In 2009, she led a group of analysts in developing the original Data Quality Assessment Framework (DQAF), which is the basis for her book.

Laura Sebastian-Coleman has delivered papers at MIT’s Information Quality Conferences and at conferences sponsored by the International Association for Information and Data Quality (IAIDQ) and the Data Governance Organization (DGO).  She holds IQCP (Information Quality Certified Professional) designation from IAIDQ, a Certificate in Information Quality from MIT, a B.A. in English and History from Franklin & Marshall College, and a Ph.D. in English Literature from the University of Rochester.

 

Additional Listening Options:

 

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Data Profiling Early and Often — Guest James Standen discusses data profiling concepts and practices, and how bad data is often misunderstood and can be coaxed away from the dark side if you know how to approach it.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Sometimes Worse Data Quality is Better

Continuing a theme from three previous posts, which discussed when it’s okay to call data quality as good as it needs to get, the occasional times when perfect data quality is necessary, and the costs and profits of poor data quality, in this blog post I want to provide three examples of when the world of consumer electronics proved that sometimes worse data quality is better.

 

When the Betamax Bet on Video Busted

While it seems like a long time ago in a galaxy far, far away, during the 1970s and 1980s a videotape format war waged between Betamax and VHS.  Betamax was widely believed to provide superior video data quality.

But a blank Betamax tape allowed users to record up to two hours of high-quality video, whereas a VHS tape allowed users to record up to four hours of slightly lower quality video.  Consumers consistently chose quantity over quality — and especially since lower quality also meant a lower price.  Betamax tapes and machines remained more expensive based on betting that consumers would be willing to pay a premium for higher-quality video.

The VHS victory demonstrated how people often choose quantity over quality, so it doesn’t always pay to have better data quality.

 

When Lossless Lost to Lossy Audio

Much to the dismay of those working in the data quality profession, most people do not care about the quality of their data unless it becomes bad enough for them to pay attention to — and complain about.

An excellent example is bitrate, which refers to the number of bits — or the amount of data — that are processed over a certain amount of time.  In his article Does Bitrate Really Make a Difference In My Music?, Whitson Gordon examined the common debate about lossless versus lossy audio formats.

Using the example of ripping a track from a CD to a hard drive, a lossless format means the track is not compressed to the point where any of its data is lost, retaining, for all intents and purposes, the same audio data quality as the original CD track.

By contrast, a lossy format compresses the track so that it takes up less space by intentionally deleting some of its data, reducing audio data quality.  Audiophiles often claim anything other than vinyl records sound lousy because they are so lossy.

However, like truth, beauty, and art, data quality can be said to be in the eyes — or the ears — of the beholder.  So, if your favorite music sounds fine to you in MP3 file format, then not only do you not need vinyl records, audio tapes, and CDs anymore, but if you consider MP3 files good enough, then you will not pay more attention to (or pay more money for) audio data quality.

 

When Digital Killed the Photograph Star

The Eastman Kodak Company, commonly known as Kodak, which was founded by George Eastman in 1888 and dominated the photograph industry for most of the 20th century, filed for bankruptcy in January 2012.  The primary reason was that Kodak, which had previously pioneered innovations like celluloid film and color photography, failed to embrace the industry’s transition to digital photography, despite the fact that Kodak invented some of the core technology used in current digital cameras.

Why?  Because Kodak believed that the data quality of digital photographs would be generally unacceptable to consumers as a replacement for film photographs.  In much the same way that Betamax assumed consumers wanted higher-quality video, Kodak assumed consumers would always want to use higher-quality photographs to capture their “Kodak moments.”

In fairness to Kodak, mobile devices are causing a massive — and rapid — disruption to many well-established business models, creating a brave new digital world, and obviously not just for photography.  However, when digital killed the photograph star, it proved, once again, that sometimes worse data quality is better.

  

Related Posts

Data Quality and the OK Plateau

When Poor Data Quality Kills

The Costs and Profits of Poor Data Quality

Promoting Poor Data Quality

Data Quality and the Cupertino Effect

The Data Quality Wager

How Data Cleansing Saves Lives

The Dichotomy Paradox, Data Quality and Zero Defects

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

The Seventh Law of Data Quality

A Tale of Two Q’s

Paleolithic Rhythm and Data Quality

Groundhog Data Quality Day

Data Quality and The Middle Way

Stop Poor Data Quality STOP

When Poor Data Quality Calls

Freudian Data Quality

Predictably Poor Data Quality

Satisficing Data Quality

i blog of Data glad and big

I recently blogged about the need to balance the hype of big data with some anti-hype.  My hope was, like a collision of matter and anti-matter, the hype and anti-hype would cancel each other out, transitioning our energy into a more productive discussion about big data.  But, of course, few things in human discourse ever reach such an equilibrium, or can maintain it for very long.

For example, Quentin Hardy recently blogged about six big data myths based on a conference presentation by Kate Crawford, who herself also recently blogged about the hidden biases in big data.  “I call B.S. on all of it,” Derrick Harris blogged in his response to the backlash against big data.  “It might be provocative to call into question one of the hottest tech movements in generations, but it’s not really fair.  That’s because how companies and people benefit from big data, data science or whatever else they choose to call the movement toward a data-centric world is directly related to what they expect going in.  Arguing that big data isn’t all it’s cracked up to be is a strawman, pure and simple — because no one should think it’s magic to begin with.”

In their new book Big Data: A Revolution That Will Transform How We Live, Work, and Think, Viktor Mayer-Schonberger and Kenneth Cukier explained that “like so many new technologies, big data will surely become a victim of Silicon Valley’s notorious hype cycle: after being feted on the cover of magazines and at industry conferences, the trend will be dismissed and many of the data-smitten startups will flounder.  But both the infatuation and the damnation profoundly misunderstand the importance of what is taking place.  Just as the telescope enabled us to comprehend the universe and the microscope allowed us to understand germs, the new techniques for collecting and analyzing huge bodies of data will help us make sense of our world in ways we are just starting to appreciate.  The real revolution is not in the machines that calculate data, but in data itself and how we use it.”

Although there have been numerous critical technology factors making the era of big data possible, such as increases in the amount of computing power, decreases in the cost of data storage, increased network bandwidth, parallel processing frameworks (e.g., Hadoop), scalable and distributed models (e.g., cloud computing), and other techniques (e.g., in-memory computing), Mayer-Schonberger and Cukier argued that “something more important changed too, something subtle.  There was a shift in mindset about how data could be used.  Data was no longer regarded as static and stale, whose usefulness was finished once the purpose for which it was collected was achieved.  Rather, data became a raw material of business, a vital economic input, used to create a new form of economic value.”

“In fact, with the right mindset, data can be cleverly used to become a fountain of innovation and new services.  The data can reveal secrets to those with the humility, the willingness, and the tools to listen.”

Pondering this big data war of words reminded me of the E. E. Cummings poem i sing of Olaf glad and big, which sings of Olaf, a conscientious objector forced into military service, who passively endures brutal torture inflicted upon him by training officers, while calmly responding (pardon the profanity): “I will not kiss your fucking flag” and “there is some shit I will not eat.”

Without question, big data has both positive and negative aspects, but the seeming unwillingness of either side in the big data war of words to “kiss each other’s flag,” so to speak, is not as concerning to me as is the conscientious objection to big data and data science expanding into realms where people and businesses were not used to enduring its influence.  For example, some will feel that data-driven audits of their decision-making is like brutal torture inflicted upon their less-than data-driven intuition.

E.E. Cummings sang the praises of Olaf “because unless statistics lie, he was more brave than me.”  i blog of Data glad and big, but I fear that, regardless of how big it is, “there is some data I will not believe” will be a common refrain by people who will lack the humility and willingness to listen to data, and who will not be brave enough to admit that statistics don’t always lie.

 

Related Posts

The Need for Data Philosophers

On Philosophy, Science, and Data

OCDQ Radio - Demystifying Data Science

OCDQ Radio - Data Quality and Big Data

Big Data and the Infinite Inbox

The Laugh-In Effect of Big Data

HoardaBytes and the Big Data Lebowski

Magic Elephants, Data Psychics, and Invisible Gorillas

Will Big Data be Blinded by Data Science?

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

Our Increasingly Data-Constructed World

The Wisdom of Crowds, Friends, and Experts

Data Separates Science from Superstition

Headaches, Data Analysis, and Negativity Bias

Why Data Science Storytelling Needs a Good Editor

Predictive Analytics, the Data Effect, and Jed Clampett

Rage against the Machines Learning

The Flying Monkeys of Big Data

Cargo Cult Data Science

Speed Up Your Data to Slow Down Your Decisions

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science