i blog of Data glad and big

I recently blogged about the need to balance the hype of big data with some anti-hype.  My hope was, like a collision of matter and anti-matter, the hype and anti-hype would cancel each other out, transitioning our energy into a more productive discussion about big data.  But, of course, few things in human discourse ever reach such an equilibrium, or can maintain it for very long.

For example, Quentin Hardy recently blogged about six big data myths based on a conference presentation by Kate Crawford, who herself also recently blogged about the hidden biases in big data.  “I call B.S. on all of it,” Derrick Harris blogged in his response to the backlash against big data.  “It might be provocative to call into question one of the hottest tech movements in generations, but it’s not really fair.  That’s because how companies and people benefit from big data, data science or whatever else they choose to call the movement toward a data-centric world is directly related to what they expect going in.  Arguing that big data isn’t all it’s cracked up to be is a strawman, pure and simple — because no one should think it’s magic to begin with.”

In their new book Big Data: A Revolution That Will Transform How We Live, Work, and Think, Viktor Mayer-Schonberger and Kenneth Cukier explained that “like so many new technologies, big data will surely become a victim of Silicon Valley’s notorious hype cycle: after being feted on the cover of magazines and at industry conferences, the trend will be dismissed and many of the data-smitten startups will flounder.  But both the infatuation and the damnation profoundly misunderstand the importance of what is taking place.  Just as the telescope enabled us to comprehend the universe and the microscope allowed us to understand germs, the new techniques for collecting and analyzing huge bodies of data will help us make sense of our world in ways we are just starting to appreciate.  The real revolution is not in the machines that calculate data, but in data itself and how we use it.”

Although there have been numerous critical technology factors making the era of big data possible, such as increases in the amount of computing power, decreases in the cost of data storage, increased network bandwidth, parallel processing frameworks (e.g., Hadoop), scalable and distributed models (e.g., cloud computing), and other techniques (e.g., in-memory computing), Mayer-Schonberger and Cukier argued that “something more important changed too, something subtle.  There was a shift in mindset about how data could be used.  Data was no longer regarded as static and stale, whose usefulness was finished once the purpose for which it was collected was achieved.  Rather, data became a raw material of business, a vital economic input, used to create a new form of economic value.”

“In fact, with the right mindset, data can be cleverly used to become a fountain of innovation and new services.  The data can reveal secrets to those with the humility, the willingness, and the tools to listen.”

Pondering this big data war of words reminded me of the E. E. Cummings poem i sing of Olaf glad and big, which sings of Olaf, a conscientious objector forced into military service, who passively endures brutal torture inflicted upon him by training officers, while calmly responding (pardon the profanity): “I will not kiss your fucking flag” and “there is some shit I will not eat.”

Without question, big data has both positive and negative aspects, but the seeming unwillingness of either side in the big data war of words to “kiss each other’s flag,” so to speak, is not as concerning to me as is the conscientious objection to big data and data science expanding into realms where people and businesses were not used to enduring its influence.  For example, some will feel that data-driven audits of their decision-making is like brutal torture inflicted upon their less-than data-driven intuition.

E.E. Cummings sang the praises of Olaf “because unless statistics lie, he was more brave than me.”  i blog of Data glad and big, but I fear that, regardless of how big it is, “there is some data I will not believe” will be a common refrain by people who will lack the humility and willingness to listen to data, and who will not be brave enough to admit that statistics don’t always lie.

 

Related Posts

The Need for Data Philosophers

On Philosophy, Science, and Data

OCDQ Radio - Demystifying Data Science

OCDQ Radio - Data Quality and Big Data

Big Data and the Infinite Inbox

The Laugh-In Effect of Big Data

HoardaBytes and the Big Data Lebowski

Magic Elephants, Data Psychics, and Invisible Gorillas

Will Big Data be Blinded by Data Science?

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

Our Increasingly Data-Constructed World

The Wisdom of Crowds, Friends, and Experts

Data Separates Science from Superstition

Headaches, Data Analysis, and Negativity Bias

Why Data Science Storytelling Needs a Good Editor

Predictive Analytics, the Data Effect, and Jed Clampett

Rage against the Machines Learning

The Flying Monkeys of Big Data

Cargo Cult Data Science

Speed Up Your Data to Slow Down Your Decisions

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

The Need for Data Philosophers

In my post On Philosophy, Science, and Data, I explained that although some argue philosophy only reigns in the absence of data while science reigns in the analysis of data, a conceptual bridge still remains between analysis and insight, the crossing of which is itself a philosophical exercise.  Therefore, I argued that an endless oscillation persists between science and philosophy, which is why, despite the fact that all we hear about is the need for data scientists, there’s also a need for data philosophers.

Of course, the debate between science and philosophy is a very old one, as is the argument we need both.  In my previous post, I slightly paraphrased Immanuel Kant (“perception without conception is blind and conception without perception is empty”) by saying that science without philosophy is blind and philosophy without science is empty.

In his book Cosmic Apprentice: Dispatches from the Edges of Science, Dorion Sagan explained that science and philosophy hang “in a kind of odd balance, watching each other, holding hands.  Science’s eye for detail, buttressed by philosophy’s broad view, makes for a kind of alembic, an antidote to both.  Although philosophy isn’t fiction, it can be more personal, creative and open, a kind of counterbalance for science even as it argues that science, with its emphasis on a kind of impersonal materialism, provides a crucial reality check for philosophy and a tendency to over-theorize that’s inimical to the scientific spirit.  Ideally, in the search for truth, science and philosophy, the impersonal and autobiographical, can keep each other honest in a kind of open circuit.”

“Science’s spirit is philosophical,” Sagan concluded.  “It is the spirit of questioning, of curiosity, of critical inquiry combined with fact-checking.  It is the spirit of being able to admit you’re wrong, of appealing to data, not authority.”

“Science,” as his father Carl Sagan said, “is a way of thinking much more than it is a body of knowledge.”  By extension, we could say that data science is about a way of thinking much more than it is about big data or about being data-driven.

I have previously blogged that science has always been about bigger questions, not bigger data.  As Claude Lévi-Strauss said, “the scientist is not a person who gives the right answers, but one who asks the right questions.”  As far as data science goes, what are the right questions?  Data scientist Melinda Thielbar proposes three key questions (Actionable? Verifiable? Repeatable?).

Here again we see the interdependence of science and philosophy.  “Philosophy,” Marilyn McCord Adams said, “is thinking really hard about the most important questions and trying to bring analytic clarity both to the questions and the answers.”

“Philosophy is critical thinking,” Don Cupitt said. “Trying to become aware of how one’s own thinking works, of all the things one takes for granted, of the way in which one’s own thinking shapes the things one’s thinking about.”  Yes, even a data scientist’s own thinking could shape the things they are thinking scientifically about.  Big data evangelist James Kobielus recently blogged about five biases that may crop up in a data scientist’s work (Cognitive, Selection, Sampling, Modeling, Funding).

“Data science has a bright future ahead,” explained Hilary Mason in a recent interview.  “There will only be more data, and more of a need for people who can find meaning and value in that data.  We’re also starting to see a greater need for data engineers, people to build infrastructure around data and algorithms, and data artists, people who can visualize the data.”

I agree with Mason, and I would add that we are also starting to see a greater need for data philosophers, people who can, borrowing the words that Anthony Kenny used to define philosophy, “think as clearly as possible about the most fundamental concepts that reach through all the disciplines.”

 

Related Posts

On Philosophy, Science, and Data

OCDQ Radio - Demystifying Data Science

OCDQ Radio - Data Quality and Big Data

Big Data and the Infinite Inbox

The Laugh-In Effect of Big Data

HoardaBytes and the Big Data Lebowski

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

The Wisdom of Crowds, Friends, and Experts

Why Data Science Storytelling Needs a Good Editor

Predictive Analytics, the Data Effect, and Jed Clampett

Bigger Questions, not Bigger Data

The Flying Monkeys of Big Data

Cargo Cult Data Science

Speed Up Your Data to Slow Down Your Decisions

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

On Philosophy, Science, and Data

Ever since Melinda Thielbar helped me demystify data science on OCDQ Radio, I have been pondering my paraphrasing of an old idea: Science without philosophy is blind; Philosophy without science is empty; Data needs both science and philosophy.

“A philosopher’s job is to find out things about the world by thinking rather than observing,” the philosopher Bertrand Russell once said.  One could say a scientist’s job is to find out things about the world by observing and experimenting.  In fact, Russell observed that “the most essential characteristic of scientific technique is that it proceeds from experiment, not from tradition.”

Russell also said that “science is what we know, and philosophy is what we don’t know.”  However, Stuart Firestein, in his book Ignorance: How It Drives Science, explained “there is no surer way to screw up an experiment than to be certain of its outcome.”

Although it seems it would make more sense for science to be driven by what we know, by facts, “working scientists,” according to Firestein, “don’t get bogged down in the factual swamp because they don’t care that much for facts.  It’s not that they discount or ignore them, but rather that they don’t see them as an end in themselves.  They don’t stop at the facts; they begin there, right beyond the facts, where the facts run out.  Facts are selected for the questions they create, for the ignorance they point to.”

In this sense, philosophy and science work together to help us think about and experiment with what we do and don’t know.

Some might argue that while anyone can be a philosopher, being a scientist requires more rigorous training.  A commonly stated requirement in the era of big data is to hire data scientists, but this begs the question: Is data science only for data scientists?

“Clearly what we need,” Firestein explained, “is a crash course in citizen science—a way to humanize science so that it can be both appreciated and judged by an informed citizenry.  Aggregating facts is useless if you don’t have a context to interpret them.”

I would argue that clearly what organizations need is a crash course in data science—a way to humanize data science so that it can be both appreciated and judged by an informed business community.  Big data is useless if you don’t have a business context to interpret it.  Firestein also made great points about science not being exclusionary (i.e., not just for scientists).  Just as you can enjoy watching sports without being a professional athlete and you can appreciate music without being a professional musician, you can—and should—learn the basics of data science (especially statistics) without being a professional data scientist.

In order to truly deliver business value to organizations, data science can not be exclusionary.  This doesn’t mean you shouldn’t hire data scientists.  In many cases, you will need the expertise of professional data scientists.  However, you will not be able to direct them or interpret their findings without understanding the basics, what could be called the philosophy of data science.

Some might argue that philosophy only reigns in the absence of data, while science reigns in the analysis of data.  Although in the era of big data there seems to be fewer areas truly absent of data, a conceptual bridge still remains between analysis and insight, the crossing of which is itself a philosophical exercise.  So, an endless oscillation persists between science and philosophy, which is why science without philosophy is blind, and philosophy without science is empty.  Data needs both science and philosophy.

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, experimentation, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

 

Related Posts

Big Data and the Infinite Inbox

HoardaBytes and the Big Data Lebowski

Will Big Data be Blinded by Data Science?

Data Silence

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Big Data el Memorioso

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

Dot Collectors and Dot Connectors

The Wisdom of Crowds, Friends, and Experts

A Statistically Significant Resolution for 2013

Speed Up Your Data to Slow Down Your Decisions

Rage against the Machines Learning

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

The Big Data Theory

Swimming in Big Data

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Demystifying Data Science

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, special guest, and actual data scientist, Dr. Melinda Thielbar, a Ph.D. Statistician, and I attempt to demystify data science by explaining what a data scientist does, including the requisite skills involved, bridging the communication gap between data scientists and business leaders, delivering data products business users can use on their own, and providing a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, experimentation, and correlation.

Melinda Thielbar is the Senior Mathematician for IAVO Research and Scientific.  Her work there focuses on power system optimization using real-time prediction models.  She has worked as a software developer, an analytic lead for big data implementations, and a statistics and programming teacher.

Melinda Thielbar is a co-founder of Research Triangle Analysts, a professional group for analysts and data scientists located in the Research Triangle of North Carolina.

While Melinda Thielbar doesn’t specialize in a single field, she is particularly interested in power systems because, as she puts it, “A power systems optimizer has to work every time.”

 

Demystifying Data Science

Additional listening options:

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

 

Related Posts

There is No Such Thing as a Root Cause

Big Data and the Infinite Inbox

HoardaBytes and the Big Data Lebowski

OCDQ Radio - Data Quality and Big Data

Will Big Data be Blinded by Data Science?

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

How Predictable Are You?

A Statistically Significant Resolution for 2013

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

The Big Data Theory