Our Increasingly Data-Constructed World

Last week, I joined fellow Information Management bloggers Art Petty, Mark Smith, Bruce Guptill, and co-hosts Eric Kavanagh and Jim Ericson for a DM Radio discussion about the latest trends and innovations in the information management industry.

For my contribution to the discussion, I talked about the long-running macro trend underlying many trends and innovations, namely that our world is becoming, not just more data-driven, but increasingly data-constructed.

Physicist John Archibald Wheeler contemplated how the bit is a fundamental particle, which, although insubstantial, could be considered more fundamental than matter itself.  He summarized this viewpoint in his pithy phrase “It from Bit” explaining how: “every it — every particle, every field of force, even the space-time continuum itself — derives its function, its meaning, its very existence entirely — even if in some contexts indirectly — from the answers to yes-or-no questions, binary choices, bits.”

In other words, we could say that the physical world is conceived of in, and derived from, the non-physical world of data.

Although bringing data into the real world has historically also required constructing other physical things to deliver data to us, more of the things in the physical world are becoming directly digitized.  As just a few examples, consider how we’re progressing:

  • From audio delivered via vinyl records, audio tapes, CDs, and MP3 files (and other file formats) to Web-streaming audio
  • From video delivered via movie reels, video tapes, DVDs, and MP4 files (and other file formats) to Web-streaming video
  • From text delivered via printed newspapers, magazines, and books to websites, blogs, e-books, and other electronic texts

Furthermore, we continue to see more physical tools (e.g., calculators, alarm clocks, calendars, dictionaries) transforming into apps and data on our smart phones, tablets, and other mobile devices.  Essentially, in a world increasingly constructed of an invisible and intangible substance called data (perhaps the datum should be added to the periodic table of elements?), one of the few things that we see and touch are the screens of our mobile devices that make the invisible visible and the intangible tangible.


Bitrate, Lossy Audio, and Quantity over Quality

If our world is becoming increasingly data-constructed, does that mean people are becoming more concerned about data quality?

In a bit, 0.  In a word, no.  And that’s because, much to the dismay of those working in the data quality profession, most people do not care about the quality of their data unless it becomes bad enough for them to pay attention to — and complain about.

An excellent example is bitrate, which refers to the number of bits — or the amount of data — that are processed over a certain amount of time.  In his article Does Bitrate Really Make a Difference In My Music?, Whitson Gordon examined the common debate about lossless and lossy audio formats.

Using the example of ripping a track from a CD to a hard drive, a lossless format means that the track is not compressed to the point where any of its data is lost, retaining, for all intents and purposes, the same audio data quality as the original CD track.

By contrast, a lossy format compresses the track so that it takes up less space by intentionally removing some of its data, thereby reducing audio data quality.  Audiophiles often claim anything other than vinyl records sound lousy because they are so lossy.

However, like truth, beauty, and art, data quality can be said to be in the eyes — or the ears — of the beholder.  So, if your favorite music sounds good enough to you in MP3 file format, then not only do you not need those physical vinyl records, audio tapes, and CDs anymore, but since you consider MP3 files good enough, you will not pay any further attention to audio data quality.

Another, and less recent, example is the videotape format war waged during the 1970s and 1980s between Betamax and VHS, when Betamax was widely believed to provide superior video data quality.

But a blank Betamax tape allowed users to record up to two hours of high-quality video, whereas a VHS tape allowed users to record up to four hours of slightly lower quality video.  Consumers consistently chose quantity over quality — and especially since lower quality also meant a lower price.  Betamax tapes and machines remained more expensive based on the assumption that consumers would pay a premium for higher quality video.

The VHS victory demonstrated how people often choose quantity over quality, so it doesn’t always pay to have better data quality.


Redefining Structure in a Data-Constructed World

Another side effect of our increasingly data-constructed world is that it is challenging the traditional data management notion that data has to be structured before it can be used — especially within many traditional notions of business intelligence.

Physicist Niels Bohr suggested that understanding the structure of the atom requires changing our definition of understanding.

Since a lot of the recent Big Data craze consists of unstructured or semi-structured data, perhaps understanding how much structure data truly requires for business applications (e.g., sentiment analysis of social networking data) requires changing our definition of structuring.  At the very least, we have to accept the fact that the relational data model is no longer our only option.

Although I often blog about how data and the real world are not the same thing, as more physical things, as well as more aspects of our everyday lives, become directly digitized, it is becoming more difficult to differentiate physical reality from digital reality.


Related Posts

HoardaBytes and the Big Data Lebowski

Magic Elephants, Data Psychics, and Invisible Gorillas

Big Data el Memorioso

The Big Data Collider

Information Overload Revisited

Dot Collectors and Dot Connectors


Plato’s Data

The Data Cold War

A Farscape Analogy for Data Quality


Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • A Brave New Data World — A discussion about how data, data quality, data-driven decision making, and metadata quality no longer reside exclusively within the esoteric realm of data management — basically, everyone is a data geek now.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.