Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments
« Devising a Mobile Device Strategy | Main | Open MIKE Podcast — Episode 08 »
Tuesday
Dec042012

The Wisdom of Crowds, Friends, and Experts

I recently finished reading the TED Book by Jim Hornthal, A Haystack Full of Needles, which included an overview of the different predictive approaches taken by one of the most common forms of data-driven decision making in the era of big data, namely, the recommendation engines increasingly provided by websites, social networks, and mobile apps.

These recommendation engines primarily employ one of three techniques, choosing to base their data-driven recommendations on the “wisdom” provided by either crowds, friends, or experts.

 

The Wisdom of Crowds

In his book The Wisdom of Crowds, James Surowiecki explained that the four conditions characterizing wise crowds are diversity of opinion, independent thinking, decentralization, and aggregation.  Amazon is a great example of a recommendation engine using this approach by assuming that a sufficiently large population of buyers is a good proxy for your purchasing decisions.

For example, Amazon tells you that people who bought James Surowiecki’s bestselling book also bought Thinking, Fast and Slow by Daniel Kahneman, Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business by Jeff Howe, and Wikinomics: How Mass Collaboration Changes Everything by Don Tapscott.  However, Amazon neither provides nor possesses knowledge of why people bought all four of these books or qualification of the subject matter expertise of these readers.

However, these concerns, which we could think of as potential data quality issues, and which would be exacerbated within a small amount of transaction data where the eclectic tastes and idiosyncrasies of individual readers would not help us decide what books to buy, within a large amount of transaction data, we achieve the Wisdom of Crowds effect when, taken in aggregate, we receive a general sense of what books we might like to read based on what a diverse group of readers collectively makes popular.

As I blogged about in my post Sometimes it’s Okay to be Shallow, sometimes the aggregated, general sentiment of a large group of unknown, unqualified strangers will be sufficient to effectively make certain decisions.

 

The Wisdom of Friends

Although the influence of our friends and family is the oldest form of data-driven decision making, historically this influence was delivered by word of mouth, which required you to either be there to hear those influential words when they were spoken, or have a large enough network of people you knew that would be able to eventually pass along those words to you.

But the rise of social networking services, such as Twitter and Facebook, has transformed word of mouth into word of data by transcribing our words into short bursts of social data, such as status updates, online reviews, and blog posts.

Facebook “Likes” are a great example of a recommendation engine that uses the Wisdom of Friends, where our decision to buy a book, see a movie, or listen to a song might be based on whether or not our friends like it.  Of course, “friends” is used in a very loose sense in a social network, and not just on Facebook, since it combines strong connections such as actual friends and family, with weak connections such as acquaintances, friends of friends, and total strangers from the periphery of our social network.

Social influence has never ended with the people we know well, as Nicholas Christakis and James Fowler explained in their book Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives.  But the hyper-connected world enabled by the Internet, and further facilitated by mobile devices, has strengthened the social influence of weak connections, and these friends form a smaller crowd whose wisdom is involved in more of our decisions than we may even be aware of.

 

The Wisdom of Experts

Since it’s more common to associate wisdom with expertise, Pandora is a great example of a recommendation engine that uses the Wisdom of Experts.  Pandora used a team of musicologists (professional musicians and scholars with advanced degrees in music theory) to deconstruct more than 800,000 songs into 450 musical elements that make up each performance, including qualities of melody, harmony, rhythm, form, composition, and lyrics, as part of what Pandora calls the Music Genome Project.

As Pandora explains, their methodology uses precisely defined terminology, a consistent frame of reference, redundant analysis, and ongoing quality control to ensure that data integrity remains reliably high, believing that delivering a great radio experience to each and every listener requires an incredibly broad and deep understanding of music.

Essentially, experts form the smallest crowd of wisdom.  Of course, experts are not always right.  At the very least, experts are not right about every one of their predictions.  Nor do experts always agree with other, which is why I imagine that one of the most challenging aspects of the Music Genome Project is getting music experts to consistently apply precisely the same methodology.

Pandora also acknowledges that each individual has a unique relationship with music (i.e., no one else has tastes exactly like yours), and allows you to “Thumbs Up” or “Thumbs Down” songs without affecting other users, producing more personalized results than either the popularity predicted by the Wisdom of Crowds or the similarity predicted by the Wisdom of Friends.

 

The Future of Wisdom

It’s interesting to note that the Wisdom of Experts is the only one of these approaches that relies on what data management and business intelligence professionals would consider a rigorous approach to data quality and decision quality best practices.  But this is also why the Wisdom of Experts is the most time-consuming and expensive approach to data-driven decision making.

In the past, the Wisdom of Crowds and Friends was ignored in data-driven decision making for the simple reason that this potential wisdom wasn’t digitized.  But now, in the era of big data, not only are crowds and friends digitized, but technological advancements combined with cost-effective options via open source (data and software) and cloud computing make these approaches quicker and cheaper than the Wisdom of Experts.  And despite the potential data quality and decision quality issues, the Wisdom of Crowds and/or Friends is proving itself a viable option for more categories of data-driven decision making.

I predict that the future of wisdom will increasingly become an amalgamation of experts, friends, and crowds, with the data and techniques from all three potential sources of wisdom often acknowledged as contributors to data-driven decision making.

 

Related Posts

Sometimes it’s Okay to be Shallow

Word of Mouth has become Word of Data

The Wisdom of the Social Media Crowd

Data Management: The Next Generation

Exercise Better Data Management

Darth Vader, Big Data, and Predictive Analytics

Data-Driven Intuition

The Big Data Theory

Finding a Needle in a Needle Stack

Big Data, Predictive Analytics, and the Ideal Chronicler

The Limitations of Historical Analysis

Magic Elephants, Data Psychics, and Invisible Gorillas

OCDQ Radio - Data Quality and Big Data

Big Data: Structure and Quality

HoardaBytes and the Big Data Lebowski

The Data-Decision Symphony

OCDQ Radio - Decision Management Systems

A Tale of Two Datas

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (2)

From the LinkedIn Group for the IAIDQ Professional Open Community, Richard Jarvis commented:

“Would you agree that what we're really doing is applying data quality (DQ) concepts to metadata, rather than the operational data? It's an interesting approach, as it means for crowd sourced data we're assessing quality based on the first-order value rather than the immediate downstream usability. As you say, we're not questioning the accuracy of Amazon's assertion that customers who purchased x also purchased y; rather, we're interested in the relevance of that information. In terms of knowledge management, I would describe this as broadening DQ to embrace information and knowledge quality (which I fully agree with).

The implication, however, is that we're also introducing new factors which could negatively impact knowledge quality, despite the operational data quality being valid. The SEO industry arguably falls into this space i.e. you believe you're working from impartially selected information based on popularity, but the results have been influenced by the 1st order party. To a certain extent though this has always been the case; well before the internet, many auctioneers were caught taking false bids in an effort to increase apparent demand. Like in so many ways, the internet is simply exacerbating an existing situation.”

And I responded:

In the case of Amazon, it is a mixture of metadata and information. Amazon is not providing direct access to their operational data, which is, of course, understandable, but instead providing some aggregated information (e.g., sales rank), some detailed information (e.g., reviews), and numerous metadata attributes (e.g., product category).

We have no way of knowing if the underlying operational data is accurate (as well as other aspects of data quality), nor do we have any way of verifying any aspect of the information quality. Some of the metadata could be verified by cross-referencing other sources (e.g., for books, verify book metadata with Barnes & Noble and the publisher).

Making use of Amazon's information has to be done on the assumption of quality — something that data and information quality professionals would never endorse in other contexts (e.g., within Amazon's internal systems).

As you said, this situation has always existed, but the Internet is exacerbating it. In fact, many of the sources involved in big data analytics face this same challenge. My blog post focused on recommendation engines, but the same challenges exist with other big data applications, such as sentiment analysis.

Furthermore, not to go too far off on a tangent, but I would argue that many, for lack of a better term, traditional data and information management applications, which I would equate with the Wisdom of Experts in my blog post, have functioned off of the same assumption of quality even though data and information quality best practices are implemented.

None of this is meant to imply that quality is not important.

However, I believe the assumption that the Wisdom of Experts is always superior, in principle, to the Wisdom of Crowds/Friends is itself simply another version of the assumption of quality (i.e., it's an assumption that quality decisions can only be made based on quality data, which if that was true, then every business would be bankrupt).

December 7, 2012 | Registered CommenterJim Harris

Via Information Management, Mike Wheeler commented:

“Excellent piece Jim. I think that, as you describe, the wisdom of crowds, friends and experts will converge (eventually). I also think that in order for this to happen and be most effective, there must be a recognition that correlations need to be combined with causality. Right now, I don't feel that machine learning has progressed to the point where causality can be determined in a consistent way which means that we'll always have to rely on those pesky humans to make sense of it all.”

And I responded:

Thanks for your comment, Mike.

The intersection of crowds, friends, and experts will certainly generate a lot of correlations, many of which will provide little to no predictive value. Causality often alludes both computers and humans. Data-driven decision making will require combining all potential signals while needing all the help it can get to filter out the noise. Sometimes that help will come more from the computers, and other times that help will come more from the humans, but as you noted, both machine and man have to be involved in the learning process.

Best Regards,

Jim

January 1, 2013 | Registered CommenterJim Harris

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>