Playlist 01

Other Ways to Listen: bit.ly/listen-dbp

This playlist of five episodes provides a good introduction to the podcast, as it exemplifies one of its goals, especially with the early episodes: Lay the foundation of discussing topics related to data analytics, machine learning, and data science by providing some working definitions, some historical context, and some practical advice.

That is Not Machine Learning

Other Ways to Listen: bit.ly/listen-dbp

Machine learning (ML) can provide unique analytical insights, as well as help automate some operational and decision-making processes more efficiently and effectively than non-ML alternatives. However, ML is also among the buzziest of buzzwords right now, and many are overselling and oversimplifying its usage.

Do not let anyone frame a data analysis, business problem, or process improvement as an ML use case. Instead, say: That is Not Machine Learning — that is a data analysis, business problem, or process improvement where ML might be able to help. But not before we evaluate other options. And with the understanding that ML is rarely going to be either the first or only aspect of the solution.

This episode was sponsored by Vertica, the unified analytics platform based on a massively scalable architecture with the broadest set of analytical functions spanning event and time series, pattern matching, geospatial, and end-to-end in-database machine learning.

Learn More at https://www.vertica.com/

Machine Learning is Label Making

Other Ways to Listen: bit.ly/listen-dbp

Label Making. That is my simple two-word definition of Machine Learning. Machine Learning is Label Making. ML is LM. 

Especially supervised machine learning, which creates either numerical labels (using regression algorithms) to make predictions about a continuous data value (such as sale or stock prices), or categorical labels (using classification algorithms) to assign data to pre-defined groups also called classes (such as Fraud or Not Fraud for financial transactions). 

This episode was sponsored by Vertica, the unified analytics platform based on a massively scalable architecture with the broadest set of analytical functions spanning event and time series, pattern matching, geospatial, and end-to-end in-database machine learning.

Learn More at https://www.vertica.com/

Cloudy with a Chance of Data Analytics

Other Ways to Listen: bit.ly/listen-dbp

Based on one of my presentations, this episode provides a five-part vendor-neutral framework for evaluating the critical capabilities of a cloud data analytics solution: Deploy, Store, Optimize, Analyze, Govern.

PDF containing the complete presentation: The Critical Capabilities of Your Next Cloud Analytics Solution.pdf.

This episode was sponsored by Vertica, the unified analytics platform based on a massively scalable architecture with the broadest set of analytical functions spanning event and time series, pattern matching, geospatial, and end-to-end in-database machine learning.

Learn More at https://www.vertica.com/

Big Data Quality, Then and Now

Other Ways to Listen: bit.ly/listen-dbp

A decade ago, just before the beginning of the data science hype cycle was the big data hype cycle. At that time I had the privilege of sitting down with Ph.D. Statistician Dr. Thomas C. Redman (aka the “Data Doc”).

We discussed whether data quality matters less in larger data sets, if statistical outliers represent business insights or data quality issues, statistical sampling errors versus measurement calibration errors, mistaking signal for noise (i.e., good data for bad data), and whether or not the principles and practices of true “data scientists” will truly be embraced by an organization’s business leaders.

This episode is an edited and slightly shortened version of that discussion, which even though it is from ten years ago, I think it still provides good insight into big data quality, then and now.

Extended Show Note: One example of a measurement calibration error that was mentioned during this discussion is the faster-than-light neutrino anomaly, which you can read more about on Wikipedia.

Three Questions for Data Analytics

Other Ways to Listen: bit.ly/listen-dbp

Before you get started on any data analytics effort, you need to have at least preliminary answers to three questions:

  1. What problem are we trying to solve?

  2. What data can we apply to that problem?

  3. What analytical techniques can we apply to that data?

This episode was sponsored by Vertica, the unified analytics platform based on a massively scalable architecture with the broadest set of analytical functions spanning event and time series, pattern matching, geospatial, and end-to-end in-database machine learning.

Learn More at https://www.vertica.com/

Machine Learning on Opening Day

Other Ways to Listen: bit.ly/listen-dbp

In time for opening day of the 2022 Major League Baseball (MLB) season, I discuss the initial results of my Baseball Data Analysis Challenge that used input data representing 6 years (2016-2021) of Boston Red Sox regular season game results.

My initial results can be found in this Microsoft Excel file: Baseball Data Analysis Challenge 2022-04-05.xlsx.

My baseball data analysis was performed using my employer’s (Vertica) in-database machine learning capabilities, and you can find my input data and my SQL scripts on GitHub.

Home Schooling your Machine Learning Model

Other Ways to Listen: bit.ly/listen-dbp

Why don’t more machine learning models graduate to production? Paige Roberts stops by to help explore this topic and drop some knowledge about how to get more machine learning models deployed in production.

This episode was sponsored by Vertica, the unified analytics platform based on a massively scalable architecture with the broadest set of analytical functions spanning event and time series, pattern matching, geospatial, and end-to-end in-database machine learning.

Learn More at https://www.vertica.com/

Data Science, Then and Now

Other Ways to Listen: bit.ly/listen-dbp

Back in 2012, Harvard Business Review declared Data Scientist was The Sexiest Job of the 21st Century. Less than a year later, I recorded a podcast discussion with an actual data scientist and Ph.D. Statistician, Dr. Melinda Thielbar, during which she discussed what a data scientist actually does and provided a straightforward explanation of key concepts, such as signal-to-noise ratio, how statistical results should be presented and explained to various audiences, uncertainty, predictability, experimentation, and correlation.

This episode is an edited and slightly shortened version of that discussion, which even though it is from nine years ago, I think it still provides good insight into data science, then and now.

Defining Data Analytics, Machine Learning, and Data Science

Other Ways to Listen: bit.ly/listen-dbp

Data analytics, machine learning, and data science—those are the three things that this podcast focuses its discussions on. This episode provides my definitions in descending order of their complexity in terms of the depth of required knowledge, competencies, and practical, demonstrable skills related to computer science and programming, mathematics and statistics, critical thinking and overall approach to solving problems with data.

My definitions also reflect a descending order of analytical advancement, because I see data science as advanced machine learning, and machine learning as advanced data analytics.

Here’s a curated list of recommended resources offering other perspectives:

  • What on earth is data science? — Blog post by Cassie Kozyrkov, which includes links to related posts on Statistics, Machine Learning, Data-mining / Analytics.

  • Making Friends with Machine Learning — YouTube Playlist by Cassie Kozyrkov of what was an internal-only (now available to everyone) Google course created to inspire beginners and amuse experts.  

This episode was sponsored by Vertica, the unified analytics platform based on a massively scalable architecture with the broadest set of analytical functions spanning event and time series, pattern matching, geospatial, and end-to-end in-database machine learning.

Learn More at https://www.vertica.com/ 

Hello, World!

Other Ways to Listen: bit.ly/listen-dbp

Hello, World! Welcome to Episode Zero! Okay, technically it’s the first episode, but I’m a geek who thinks all indexes should start at 0 not 1. Anyway, this is more of a meta-episode introducing the host, explaining what the podcast is about, and letting you know what to expect from future episodes.

The focus of this podcast is to discuss topics related to data analytics, machine learning, and data science. The goal is to provide a mix of information, education, thought leadership, and hopefully a little entertainment—so info-educa-thought-tainment. That’s a word. I just made it up. Which is okay since all words are made up.

This episode was sponsored by Vertica, the unified analytics platform based on a massively scalable architecture with the broadest set of analytical functions spanning event and time series, pattern matching, geospatial, and end-to-end in-database machine learning.

Learn More at https://www.vertica.com/