The Wisdom of Crowds, Friends, and Experts
Jim Harris in
Books,
Data Quality,
Social Media tagged
Best of 2012,
Big Data,
Business Intelligence,
Philosophy,
Predictive Analytics
Tuesday, December 4, 2012 at 5:00PM I recently finished reading the TED Book by Jim Hornthal, A Haystack Full of Needles, which included an overview of the different predictive approaches taken by one of the most common forms of data-driven decision making in the era of big data, namely, the recommendation engines increasingly provided by websites, social networks, and mobile apps.
These recommendation engines primarily employ one of three techniques, choosing to base their data-driven recommendations on the “wisdom” provided by either crowds, friends, or experts.
The Wisdom of Crowds
In his book The Wisdom of Crowds, James Surowiecki explained that the four conditions characterizing wise crowds are diversity of opinion, independent thinking, decentralization, and aggregation. Amazon is a great example of a recommendation engine using this approach by assuming that a sufficiently large population of buyers is a good proxy for your purchasing decisions.
For example, Amazon tells you that people who bought James Surowiecki’s bestselling book also bought Thinking, Fast and Slow by Daniel Kahneman, Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business by Jeff Howe, and Wikinomics: How Mass Collaboration Changes Everything by Don Tapscott. However, Amazon neither provides nor possesses knowledge of why people bought all four of these books or qualification of the subject matter expertise of these readers.
However, these concerns, which we could think of as potential data quality issues, and which would be exacerbated within a small amount of transaction data where the eclectic tastes and idiosyncrasies of individual readers would not help us decide what books to buy, within a large amount of transaction data, we achieve the Wisdom of Crowds effect when, taken in aggregate, we receive a general sense of what books we might like to read based on what a diverse group of readers collectively makes popular.
As I blogged about in my post Sometimes it’s Okay to be Shallow, sometimes the aggregated, general sentiment of a large group of unknown, unqualified strangers will be sufficient to effectively make certain decisions.
The Wisdom of Friends
Although the influence of our friends and family is the oldest form of data-driven decision making, historically this influence was delivered by word of mouth, which required you to either be there to hear those influential words when they were spoken, or have a large enough network of people you knew that would be able to eventually pass along those words to you.
But the rise of social networking services, such as Twitter and Facebook, has transformed word of mouth into word of data by transcribing our words into short bursts of social data, such as status updates, online reviews, and blog posts.
Facebook “Likes” are a great example of a recommendation engine that uses the Wisdom of Friends, where our decision to buy a book, see a movie, or listen to a song might be based on whether or not our friends like it. Of course, “friends” is used in a very loose sense in a social network, and not just on Facebook, since it combines strong connections such as actual friends and family, with weak connections such as acquaintances, friends of friends, and total strangers from the periphery of our social network.
Social influence has never ended with the people we know well, as Nicholas Christakis and James Fowler explained in their book Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives. But the hyper-connected world enabled by the Internet, and further facilitated by mobile devices, has strengthened the social influence of weak connections, and these friends form a smaller crowd whose wisdom is involved in more of our decisions than we may even be aware of.
The Wisdom of Experts
Since it’s more common to associate wisdom with expertise, Pandora is a great example of a recommendation engine that uses the Wisdom of Experts. Pandora used a team of musicologists (professional musicians and scholars with advanced degrees in music theory) to deconstruct more than 800,000 songs into 450 musical elements that make up each performance, including qualities of melody, harmony, rhythm, form, composition, and lyrics, as part of what Pandora calls the Music Genome Project.
As Pandora explains, their methodology uses precisely defined terminology, a consistent frame of reference, redundant analysis, and ongoing quality control to ensure that data integrity remains reliably high, believing that delivering a great radio experience to each and every listener requires an incredibly broad and deep understanding of music.
Essentially, experts form the smallest crowd of wisdom. Of course, experts are not always right. At the very least, experts are not right about every one of their predictions. Nor do experts always agree with other, which is why I imagine that one of the most challenging aspects of the Music Genome Project is getting music experts to consistently apply precisely the same methodology.
Pandora also acknowledges that each individual has a unique relationship with music (i.e., no one else has tastes exactly like yours), and allows you to “Thumbs Up” or “Thumbs Down” songs without affecting other users, producing more personalized results than either the popularity predicted by the Wisdom of Crowds or the similarity predicted by the Wisdom of Friends.
The Future of Wisdom
It’s interesting to note that the Wisdom of Experts is the only one of these approaches that relies on what data management and business intelligence professionals would consider a rigorous approach to data quality and decision quality best practices. But this is also why the Wisdom of Experts is the most time-consuming and expensive approach to data-driven decision making.
In the past, the Wisdom of Crowds and Friends was ignored in data-driven decision making for the simple reason that this potential wisdom wasn’t digitized. But now, in the era of big data, not only are crowds and friends digitized, but technological advancements combined with cost-effective options via open source (data and software) and cloud computing make these approaches quicker and cheaper than the Wisdom of Experts. And despite the potential data quality and decision quality issues, the Wisdom of Crowds and/or Friends is proving itself a viable option for more categories of data-driven decision making.
I predict that the future of wisdom will increasingly become an amalgamation of experts, friends, and crowds, with the data and techniques from all three potential sources of wisdom often acknowledged as contributors to data-driven decision making.
Related Posts
Sometimes it’s Okay to be Shallow
Word of Mouth has become Word of Data
The Wisdom of the Social Media Crowd
Data Management: The Next Generation
Exercise Better Data Management
Darth Vader, Big Data, and Predictive Analytics
Finding a Needle in a Needle Stack
Big Data, Predictive Analytics, and the Ideal Chronicler
The Limitations of Historical Analysis
Magic Elephants, Data Psychics, and Invisible Gorillas
OCDQ Radio - Data Quality and Big Data
Big Data: Structure and Quality
HoardaBytes and the Big Data Lebowski



Reader Comments (2)
From the LinkedIn Group for the IAIDQ Professional Open Community, Richard Jarvis commented:
“Would you agree that what we're really doing is applying data quality (DQ) concepts to metadata, rather than the operational data? It's an interesting approach, as it means for crowd sourced data we're assessing quality based on the first-order value rather than the immediate downstream usability. As you say, we're not questioning the accuracy of Amazon's assertion that customers who purchased x also purchased y; rather, we're interested in the relevance of that information. In terms of knowledge management, I would describe this as broadening DQ to embrace information and knowledge quality (which I fully agree with).
The implication, however, is that we're also introducing new factors which could negatively impact knowledge quality, despite the operational data quality being valid. The SEO industry arguably falls into this space i.e. you believe you're working from impartially selected information based on popularity, but the results have been influenced by the 1st order party. To a certain extent though this has always been the case; well before the internet, many auctioneers were caught taking false bids in an effort to increase apparent demand. Like in so many ways, the internet is simply exacerbating an existing situation.”
And I responded:
In the case of Amazon, it is a mixture of metadata and information. Amazon is not providing direct access to their operational data, which is, of course, understandable, but instead providing some aggregated information (e.g., sales rank), some detailed information (e.g., reviews), and numerous metadata attributes (e.g., product category).
We have no way of knowing if the underlying operational data is accurate (as well as other aspects of data quality), nor do we have any way of verifying any aspect of the information quality. Some of the metadata could be verified by cross-referencing other sources (e.g., for books, verify book metadata with Barnes & Noble and the publisher).
Making use of Amazon's information has to be done on the assumption of quality — something that data and information quality professionals would never endorse in other contexts (e.g., within Amazon's internal systems).
As you said, this situation has always existed, but the Internet is exacerbating it. In fact, many of the sources involved in big data analytics face this same challenge. My blog post focused on recommendation engines, but the same challenges exist with other big data applications, such as sentiment analysis.
Furthermore, not to go too far off on a tangent, but I would argue that many, for lack of a better term, traditional data and information management applications, which I would equate with the Wisdom of Experts in my blog post, have functioned off of the same assumption of quality even though data and information quality best practices are implemented.
None of this is meant to imply that quality is not important.
However, I believe the assumption that the Wisdom of Experts is always superior, in principle, to the Wisdom of Crowds/Friends is itself simply another version of the assumption of quality (i.e., it's an assumption that quality decisions can only be made based on quality data, which if that was true, then every business would be bankrupt).
Via Information Management, Mike Wheeler commented:
“Excellent piece Jim. I think that, as you describe, the wisdom of crowds, friends and experts will converge (eventually). I also think that in order for this to happen and be most effective, there must be a recognition that correlations need to be combined with causality. Right now, I don't feel that machine learning has progressed to the point where causality can be determined in a consistent way which means that we'll always have to rely on those pesky humans to make sense of it all.”
And I responded:
Thanks for your comment, Mike.
The intersection of crowds, friends, and experts will certainly generate a lot of correlations, many of which will provide little to no predictive value. Causality often alludes both computers and humans. Data-driven decision making will require combining all potential signals while needing all the help it can get to filter out the noise. Sometimes that help will come more from the computers, and other times that help will come more from the humans, but as you noted, both machine and man have to be involved in the learning process.
Best Regards,
Jim