Saving Private Data

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

This episode is an edited rebroadcast of a segment from the OCDQ Radio 2011 Year in Review, during which Daragh O Brien and I discuss the data privacy and data protection implications of social media, cloud computing, and big data.

Daragh O Brien is one of Ireland’s leading Information Quality and Governance practitioners.  After being born at a young age, Daragh has amassed a wealth of experience in quality information driven business change, from CRM Single View of Customer to Regulatory Compliance, to Governance and the taming of information assets to benefit the bottom line, manage risk, and ensure customer satisfaction.  Daragh O Brien is the Managing Director of Castlebridge Associates, one of Ireland’s leading consulting and training companies in the information quality and information governance space.

Daragh O Brien is a founding member and former Director of Publicity for the IAIDQ, which he is still actively involved with.  He was a member of the team that helped develop the Information Quality Certified Professional (IQCP) certification and he recently became the first person in Ireland to achieve this prestigious certification.

In 2008, Daragh O Brien was awarded a Fellowship of the Irish Computer Society for his work in developing and promoting standards of professionalism in Information Management and Governance.

Daragh O Brien is a regular conference presenter, trainer, blogger, and author with two industry reports published by Ark Group, the most recent of which is The Data Strategy and Governance Toolkit.

You can also follow Daragh O Brien on Twitter and connect with Daragh O Brien on LinkedIn.


Saving Private Data

Additional listening options:


Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • Social Media Strategy — Guest Crysta Anderson of IBM Initiate explains social media strategy and content marketing, including three recommended practices: (1) Listen intently, (2) Communicate succinctly, and (3) Have fun.
  • The Fall Back Recap Show — A look back at the Best of OCDQ Radio, including discussions about Data, Information, Business-IT Collaboration, Change Management, Big Analytics, Data Governance, and the Data Revolution.

The Graystone Effects of Big Data

As a big data geek and a big fan of science fiction, I was intrigued by Zoe Graystone, the central character of the science fiction television show Caprica, which was a spin-off prequel of the re-imagined Battlestar Galactica television show.

Zoe Graystone was a teenage computer programming genius who created a virtual reality avatar of herself based on all of the available data about her own life, leveraging roughly 100 terabytes of personal data from numerous databases.  This allowed her avatar to access data from her medical files, DNA profiles, genetic typing, CAT scans, synaptic records, psychological evaluations, school records, emails, text messages, phone calls, audio and video recordings, security camera footage, talent shows, sports, restaurant bills, shopping receipts, online search history, music lists, movie tickets, and television shows.  The avatar transformed that big data into personality and memory, and believably mimicked the real Zoe Graystone within a virtual reality environment.

The best science fiction reveals just how thin the line is that separates imagination from reality.  Over thirty years ago, around the time of the original Battlestar Galactica television show, virtual reality avatars based on massive amounts of personal data would likely have been dismissed as pure fantasy.  But nowadays, during the era of big data and data science, the idea of Zoe Graystone creating a virtual reality avatar of herself doesn’t sound so far-fetched, nor is it pure data science fiction.

“On Facebook,” Ellis Hamburger recently blogged, “you’re the sum of all your interactions and photos with others.  Foursquare began its life as a way to see what your friends are up to, but it has quickly evolved into a life-logging tool / artificial intelligence that knows you like an old friend does.”

Facebook and Foursquare are just two social media examples of our increasingly data-constructed world, which is creating a virtual reality environment where our data has become our avatar and our digital mouths are speaking volumes about us.

Big data and real data science are enabling people and businesses of all sizes to put this virtual reality environment to good use, such as customers empowering themselves with data and companies using predictive analytics to discover business insights.

I refer to the positive aspects of Big Data as the Zoe Graystone Effect.

But there are also negative aspects to the virtual reality created by our big data avatars.  For example, in his recent blog post Rethinking Privacy in an Era of Big Data, Quentin Hardy explained “by triangulating different sets of data (you are suddenly asking lots of people on LinkedIn for endorsements on you as a worker, and on Foursquare you seem to be checking in at midday near a competitor’s location), people can now conclude things about you (you’re probably interviewing for a job there).”

On the Caprica television show, Daniel Graystone (her father) used Zoe’s avatar as the basis for an operating system for a race of sentient machines known as Cylons, which ultimately lead to the Cylon Wars and the destruction of most of humanity.  A far less dramatic example from the real world, which I explained in my blog post The Data Cold War, is how companies like Google use the virtual reality created by our big data avatars against us by selling our personal data (albeit indirectly) to advertisers.

I refer to the negative aspects of Big Data as the Daniel Graystone Effect.

How have your personal life and your business activities been affected by the Graystone Effects of Big Data?


This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.


Data Driven

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

This is Part 1 of 2 from my recent discussion with Tom Redman.  In this episode, Tom and I discuss concepts from one of my favorite data quality books, which is his most recent book: Data Driven: Profiting from Your Most Important Business Asset.

Our discussion includes viewing data as an asset, an organization’s hierarchy of data needs, a simple model for culture change, and attempting to achieve the “single version of the truth” being marketed as a goal of master data management (MDM).

Dr. Thomas C. Redman (the “Data Doc”) is an innovator, advisor, and teacher.  He was first to extend quality principles to data and information in the late 80s.  Since then he has crystallized a body of tools, techniques, roadmaps and organizational insights that help organizations make order-of-magnitude improvements.

More recently Tom has developed keen insights into the nature of data and formulated the first comprehensive approach to “putting data to work.”  Taken together, these enable organizations to treat data as assets of virtually unlimited potential.

Tom has personally helped dozens of leaders and organizations better understand data and data quality and start their data programs.  He is a sought-after lecturer and the author of dozens of papers and four books.

Prior to forming Navesink Consulting Group in 1996, Tom conceived the Data Quality Lab at AT&T Bell Laboratories in 1987 and led it until 1995. Tom holds a Ph.D. in statistics from Florida State University.  He holds two patents.


Data Driven

Additional listening options:


Win a copy of the Book

Tom Redman wants to give one OCDQ Radio listener a free copy of Data Driven: Profiting from Your Most Important Business Asset

Here is how the book contest will work:

(1) Book Contest Question — Name at least one of the five aspects of the hierarchy of data and information needs that was described by Tom Redman during this OCDQ Radio episode.


(2) Book Contest Deadline — By or before March 31, 2012, Email Jim Harris with your answer to the book contest question.


(3) Book Contest Winner — In April 2012, one winner will be randomly selected from the emails containing the correct answer to the contest question, and Tom Redman (or his publisher) will email the winner requesting a shipping address for the book.



Related Posts

A Farscape Analogy for Data Quality

The Data Quality Wager

DQ-View: Data Is as Data Does

Common Change

DQ-View: Talking about Data

Hailing Frequencies Open

Beyond a “Single Version of the Truth”

DQ-Tip: “Don't pass bad data on to the next person...”

Hyperactive Data Quality (Second Edition)

Data Quality: Quo Vadimus?

Data Quality and Miracle Exceptions


Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Magic Elephants, Data Psychics, and Invisible Gorillas

This blog post is sponsored by the Enterprise CIO Forum and HP.

A recent Forbes article predicts Big Data will be a $50 billion market by 2017, and Michael Friedenberg recently blogged how the rise of big data is generating buzz about Hadoop (which I call the Magic Elephant): “It certainly looks like the Holy Grail for organizing unstructured data, so it’s no wonder everyone is jumping on this bandwagon.  So get ready for Hadoopalooza 2012.”

John Burke recently blogged about the role of big data helping CIOs “figure out how to handle the new, the unusual, and the unexpected as an opportunity to focus more clearly on how to bring new levels of order to their traditional structured data.”

As I have previously blogged, many big data proponents (especially the Big Data Lebowski vendors selling Hadoop solutions) extol its virtues as if big data provides clairvoyant business insight, as if big data was the Data Psychic of the Information Age.

But a recent New York Times article opened with the story of a statistician working for a large retail chain being asked by his marketing colleagues: “If we wanted to figure out if a customer is pregnant, even if she didn’t want us to know, can you do that?” As Eric Siegel of Predictive Analytics World is quoted in the article, “we’re living through a golden age of behavioral research.  It’s amazing how much we can figure out about how people think now.”

So, perhaps calling big data psychic is not so far-fetched after all.  However, the potential of predictive analytics exemplifies why one of the biggest implications about big data is the data privacy concerns it raises.

Although it’s amazing (and scary) how much the Data Psychic can figure out about how we think (and work, shop, vote, love), it’s equally amazing (and scary) how much Psychology is figuring out about how we think, how we behave, and how we decide.

As I recently blogged about WYSIATI (“what you see is all there is” from Daniel Kahneman’s book Thinking, Fast and Slow), when you are using big data to make business decisions, what you are looking for can greatly influence what you are looking at (and vice versa).  But this natural human tendency could cause you miss the Invisible Gorilla walking across your screen.

If you are unfamiliar with that psychology experiment, which was created by Christopher Chabris and Daniel Simons, authors of the book The Invisible Gorilla: How Our Intuitions Deceive Us, then I recommend going to (By the way, before I was familiar with its premise, the first time I watched the video, I did not see the guy in the gorilla suit, and now when I watch the video, seeing the “invisible gorilla” distracts me, causing me to not count the number of passes correctly.)

In his book Incognito: The Secret Lives of the Brain, David Eagleman explained how our brain samples just a small bit of the physical world, making time-saving assumptions and seeing only as well as it needs to.  As our eyes interrogate the world, they optimize their strategy for the incoming data, arbitrating a battle between the conflicting information.  What we see is not what is really out there, but instead only a moment-by-moment version of which perception is winning over the others.  Our perception works not by building up bits of captured data, but instead by matching our expectations to the incoming sensory data.

I don’t doubt the Magic Elephants and Data Psychics provide the potential to envision and analyze almost anything happening within the complex and constantly changing business world — as well as the professional and personal lives of the people in it.

But I am concerned that information optimization driven by the biases of our human intuition and perception will only match our expectations to those fast-moving large volumes of various data, thereby causing us to not see many of the Invisible Gorillas.

Although this has always been a business intelligence concern, as technological advancements improve our data analytical tools, we must not lose sight of the fact that tools and data remain only as effective (and as beneficent) as the humans who wield them.

This blog post is sponsored by the Enterprise CIO Forum and HP.


Related Posts

Big Data el Memorioso

Neither the I Nor the T is Magic

Information Overload Revisited

HoardaBytes and the Big Data Lebowski


The Speed of Decision

The Data-Decision Symphony

A Decision Needle in a Data Haystack

The Big Data Collider

Dot Collectors and Dot Connectors

DQ-View: Data Is as Data Does

Data, Information, and Knowledge Management

So Long 2011, and Thanks for All the . . .

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Don’t Panic!  Welcome to the mostly harmless OCDQ Radio 2011 Year in Review episode.  During this approximately 42 minute episode, I recap the data-related highlights of 2011 in a series of sometimes serious, sometimes funny, segments, as well as make wacky and wildly inaccurate data-related predictions about 2012.

Special thanks to my guests Jarrett Goldfedder, who discusses Big Data, Nicola Askham, who discusses Data Governance, and Daragh O Brien, who discusses Data Privacy.  Additional thanks to Rich Murnane and Dylan Jones.  And Deep Thanks to that frood Douglas Adams, who always knew where his towel was, and who wrote The Hitchhiker’s Guide to the Galaxy.


So Long 2011, and Thanks for All the . . .

Additional listening options:


Previous OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post: