Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments
« Smart Big Data Adoption for Midsize Businesses | Main | The Hawthorne Effect, Helter Skelter, and Data Governance »
Tuesday
Feb192013

Demystifying Data Science 

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, special guest, and actual data scientist, Dr. Melinda Thielbar, a Ph.D. Statistician, and I attempt to demystify data science by explaining what a data scientist does, including the requisite skills involved, bridging the communication gap between data scientists and business leaders, delivering data products business users can use on their own, and providing a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, experimentation, and correlation.

Melinda Thielbar is the Senior Mathematician for IAVO Research and Scientific.  Her work there focuses on power system optimization using real-time prediction models.  She has worked as a software developer, an analytic lead for big data implementations, and a statistics and programming teacher.

Melinda Thielbar is a co-founder of Research Triangle Analysts, a professional group for analysts and data scientists located in the Research Triangle of North Carolina.

While Melinda Thielbar doesn’t specialize in a single field, she is particularly interested in power systems because, as she puts it, “A power systems optimizer has to work every time.”

 

Demystifying Data Science

Additional listening options:

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

 

Related Posts

There is No Such Thing as a Root Cause

Big Data and the Infinite Inbox

HoardaBytes and the Big Data Lebowski

OCDQ Radio - Data Quality and Big Data

Will Big Data be Blinded by Data Science?

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

How Predictable Are You?

A Statistically Significant Resolution for 2013

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

The Big Data Theory

PrintView Printer Friendly Version

EmailEmail Article to Friend

References (1)

References allow you to track sources for this article, as well as articles that were written in response to this article.

Reader Comments (2)

I found your podcast on Stitcher, and it is the only one I have listened to so far, so apologies if this comment is taking a single remark out of context. I found the podcast very interesting and have subscribed.

However, I wanted to respond to the criticism/comment about Netflix, because the very nature of their competition known as The Netflix Prize* was specifically about improving prediction, in a context where it is required to make a prediction, rather than to try to reduce the "viewers" down to series of featureless numbers.

But your other comment along the lines of "I liked Star Wars" so "I should also like Star Trek" also annoyed me slightly, because I followed some of the early bloggers who were using SVD techniques, and whilst you get a lot of what you would expect, the more detailed categories that emerged from the data were really quite interesting.

Here were some examples of contrasts (Source: sifter.org/~simon/journal/20061027.2.html):

I found category 9 funny, so people who like Star Trek, really don't like the Office:

Category 9:

Star Trek VI: The Undiscovered Country (1991)
Star Trek: The Next Generation: Season 3 (1989)
Star Trek: Generations (1994)
Star Trek: First Contact (1996)
Star Trek: Insurrection (1998)
Star Trek: The Next Generation: Season 1 (1987)
Star Trek III: The Search for Spock (1984)
Labyrinth (1986)
Star Trek V: The Final Frontier (1989)
Star Trek: The Next Generation: Season 7 (1993)
Star Trek: The Next Generation: Season 5 (1991)
What Dreams May Come (1998)
Star Trek IV: The Voyage Home (1986)
Star Trek: The Next Generation: Season 2 (1988)
Star Trek: The Next Generation: Season 4 (1990)

Vs.

The Passion of the Christ (2004)
The Office: Series 1 (2001)
The Office Special (2001)
The Office: Series 2 (2002)
Diary of a Mad Black Woman (2005)
Curb Your Enthusiasm: Season 1 (2000)
Arrested Development: Season 1 (2003)
Because of Winn-Dixie (2005)
City of God (2002)
Curb Your Enthusiasm: Season 2 (2001)
Madea's Class Reunion (2003)
Barbershop 2: Back in Business (2004)
The Fast and the Furious (2001)
Shark Tale (2004)
The Wire: Season 1 (2003)

* "The Netflix Prize sought to substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences."

** "It is our great honor to announce the $1M Grand Prize winner of the Netflix Prize contest as team BellKor’s Pragmatic Chaos for their verified submission on July 26, 2009 at 18:18:28 UTC, achieving the winning RMSE of 0.8567 on the test subset. This represents a 10.06% improvement over Cinematch’s score on the test subset at the start of the contest."

February 26, 2013 | Unregistered CommenterTom Hodder

Thanks for your excellent comment, Tom.

I really appreciate you calling me out on what was an unfair criticism of the Netflix algorithm, which actually does a good job helping me discover television shows and movies that I would not have previously considered.

During that segment of the podcast, Melinda and I were discussing the limitations of predicting the behavior of humans using data science and I simply chose Netflix as a quick example of something that was inherently more predictable than other human behavior as well as something that has little negative effect.

By the latter point, I mean that poorly predicting my tastes in movies is not as concerning as poorly predicting the likelihood I would default on my mortgage, or poorly predicting my plans to save and invest for retirement.

Additionally, an important aspect of data science that was not covered during this podcast is the data privacy implications of predictive models using big data. Here again, entertainment-enhancing algorithms such as Netflix do not raise the same concerns as Orwellian Big Brother data-surveillance algorithms.

Best Regards,

Jim

February 27, 2013 | Registered CommenterJim Harris

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>