Sentiment and Sensibility

“Engagement — like influence — is at heart an abstract,” Julie Hunt recently blogged.  “The overall picture comes from quantitative analytics results that are tempered by the understanding that customer engagement has a distinct and elusive qualitative side.  The qualitative side of digital engagement takes us into the messier world of sentiment analysis.”

As usual I agree with Hunt, especially about the messy world of sentiment analysis, which analyzes large amounts of largely unstructured data in an attempt to understand how customers think and feel about products and services.  Quantifying is easy, especially for computers.  Qualifying is elusive, even for humans.  So how much accuracy can we reasonably expect from quantitative approaches like sentiment analysis?  Can such an approach actually reveal high-quality insights?

In his insightful book The Secret Life of Pronouns: What Our Words Say About Us, social psychologist and language expert James Pennebaker shared results from his groundbreaking research in computational linguistics—in essence, counting the frequency of words we use—to show that our language carries secrets about, among other things, our thoughts and feelings.

“Sociolinguists,” Pennebaker explained, “focus on broad social dimensions such as gender, race, social class, and power.  Their approach is qualitative, involving recording and analyzing conversations on a case-by-case basis.  It is slow, painstaking work.  Over the course of a year, a good sociolinguist may analyze only a few interactions.  Whereas the qualitative approach is powerful at getting an in-depth understanding of a small group of interactions, the methods are not designed to get an accurate picture of an entire society or culture.  This is where computer-based text analysis methods can help.  By analyzing the blogs of hundreds of thousands of people, for example, the computer-based methods can quickly determine the nature of gender differences as a function of age, class, native language, region, and other domains.”

“In other words,” Pennebaker concluded, “a relatively slow but careful qualitative approach can give us an in-depth view of a small group of people; a computer-based quantitative approach provides a broader social and cultural perspective.”

One example of sentiment analysis is trying to accurately predict box office sales before the opening weekend of a new movie, about which Graeme Noseworthy recently blogged.  Although he reported high prediction accuracy compared to industry benchmarks, he also “learned that intent to watch extracted from social buzz does not necessarily equate to positive sentiment.”  

“For example, a big budget action movie that has a high degree of normalized net sentiment was relatively easy to accurately predict.  But a live action kid’s movie is much more difficult.  Why?  Simple: kids don’t tweet.  As such, a movie like that might have a large amount of negative sentiment but still perform very well at the box office.  How so?  Consider all the parents that tweet out something along the lines of: OMG, I have to go see ______ with my kids AGAIN? This is torture! #terriblemovies  They might dread the experience but chances are the kids loved the movie and will want to see it again and again.”

With its themes of gender and social class, Jane Austen’s 19th century novel Sense and Sensibility was great fodder for 20th century sociolinguists.  If 21st century social media had been around in 1995, would sentiment analysis have been able to predict opening weekend box office sales for Emma Thompson’s movie adaptation of the novel?  Perhaps just as Austen’s novel left it to the reader to decide whether sense and sensibility merged, sentiment analysis leaves it to the user to decide whether sentiment and sensibility merge the quantitative and qualitative enough to make sense of the messy world of the words customers choose when their word of mouth becomes the data driving the way businesses of all sizes market their products and services today.