Back in the 1980s, long before both the World Wide Web and Big Data, Stewart Brand uttered the iconic words:
“On the one hand, information wants to be expensive, because it’s so valuable. The right information in the right place just changes your life. On the other hand, information wants to be free, because the cost of getting it out is getting lower and lower all the time. So you have these two fighting against each other.”
Even though Brand stated that this tension between expensive and free will never go way, most people still only quote the “information wants to be free” part of his statement. The ambiguity of the adjective free lead to the necessary distinction between gratis (for zero price) and libre (with little or no restriction), and the digital economy of the 21st century is often built on both, whereby free (gratis) Internet/mobile-based services are provided by companies (e.g., Google and Facebook) to consumers in exchange for free (libre) access to their (often personally identifying) information, thereby tendering privacy as a new currency.
I would argue, first and foremost, information wants to be secure. After that’s been established, it doesn’t matter in what sense of the word information is free. In the era of big data and predictive analytics, where much of data’s value is in secondary uses, there’s an increasing need for a form of differential privacy that protects personal privacy while still enabling business insights.
“As analytics get more and more powerful,” IBM’s Jeff Jonas said in an interview about data privacy and predictive analytics, “I think it’s responsible to build more and more privacy mechanisms into the technology, and, where you can, by default. I think that’s going to benefit the organizations that use it. Companies are strained by all the things that they want to do with analytics, and they’re asking themselves the question: How do we make sure it gets used properly?”
Jonas described the need to “anonymize the data at the edge, where it lives in the host system, before you bring it together to share it and combine it with other data” by selectively anonymizing data, such as Social Security number and date of birth, with the goal not being to hide the identity of the person, but to protect the values that you wouldn’t want to be revealed.
“For example,” Jonas explained, “you can see the names and addresses. You know who it is. You’re not trying to make it an unidentifiable person, but the personally identifiable information is modified in a way that is non-human readable and non-reversible. What you’re doing is you’re preventing the unintended disclosure of personal identifiers. You’re trying to bring diverse data together, so you can have a more complete picture, so you can be more competitive, and give me the best ad, and recognize if someone is stealing my credit card. You’re trying to bring the data together to make those higher quality predictions. It still lives in the source systems. The question is, when you make a copy of data and put it into a big data system, it’s just another copy. Your risk of disclosure doubles. With these Privacy by Design principles, before you make the copy, you render it useful for some analytics but less useful for people stealing my date of birth and Social Security number.”
“For an organization to be competitive today,” Jonas concluded, “they’d better figure out how to make sense of their data, or they’re not going to be in business. And then the next thing after that is, how can they do it in a way that’s more responsible and reduce the risk of misuse that might damage their brand.”
This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet. I’ve been compensated to contribute to this program, but the opinions expressed in this post are my own and don’t necessarily represent IBM’s positions, strategies, or opinions.