Will Big Data be Blinded by Data Science?

All of the hype about Big Data is also causing quite the hullabaloo about hiring Data Scientists in order to help your organization derive business value from big data analytics.  But even though we are still in the hype and hullabaloo stages, these unrelenting trends are starting to rightfully draw the attention of businesses of all sizes.  After all, the key word in big data isn’t big, because, in our increasing data-constructed world, big data is no longer just for big companies and high-tech firms.

And since the key word in data scientist isn’t data, in this post I want to focus on the second word in today’s hottest job title.

When I think of a scientist of any kind, I immediately think of the scientific method, which has been the standard operating procedure of scientific discovery since the 17th century.  First, you define a question, gather some initial data, and form a hypothesis, which is some idea about how to answer your question.  Next, you perform an experiment to test the hypothesis, during which more data is collected.  Then, you analyze the experimental data and evaluate your results.  Whether or not the experiment confirmed or contradicted your hypothesis, you do the same thing — repeat the experiment.  Because a hypothesis can only be promoted to a theory after repeated experimentation (including by others) consistently produces the same result.

During experimentation, failure happens just as often as, if not more often than, success.  However, both failure and success have long played an important role in scientific discovery because progress in either direction is still progress.

Therefore, experimentation is an essential component of scientific discovery — and data science is certainly no exception.

“Designed experiments,” Melinda Thielbar recently blogged, “is where we’ll make our next big leap for data science.”  I agree, but with the notable exception of A/B testing in marketing, most business activities generally don’t embrace data experimentation.

“The purpose of science,” Tom Redman recently explained, “is to discover fundamental truths about the universe.  But we don’t run our businesses to discover fundamental truths.  We run our businesses to serve a customer, gain marketplace advantage, or make money.”  In other words, the commercial application of science has more to do with commerce than it does with science.

One example of the challenges inherent in the commercial application of science is the misconception that predictive analytics can predict what is going to happen with certainty.  When instead, what it actually does is predict some of the possible things that could happen with a certain probability.  Although predictive analytics can be a valuable tool for many business activities, especially decision making, as Steve Miller recently blogged, most of us are not good at using probabilities to make decisions.

So, with apologies to Thomas Dolby, I can’t help but wonder, will big data be blinded by data science?  Will the business leaders being told to hire data scientists to derive business value from big data analytics be blind to what data science tries to show them?


This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.