Serving IT with a Side of Hash Browns

This blog post is sponsored by the Enterprise CIO Forum and HP.

Since it’s where I started my career, I often ponder what it would be like to work in the IT department today.  This morning, instead of sitting in a cubicle with no window view other than the one Bill Gates gave us, I’m sitting in a booth by a real window, albeit one with a partially obstructed view of the parking lot, at a diner eating a two-egg omelette with a side of hash browns.

But nowadays, it’s possible that I’m still sitting amongst my fellow IT workers.  Perhaps the older gentleman to my left is verifying last night’s database load using his laptop.  Maybe the younger woman to my right is talking into her Bluetooth earpiece with a business analyst working on an ad hoc report.  And the couple in the corner could be struggling to understand the technology requirements of the C-level executive they’re meeting with, who’s now vocalizing his displeasure about sitting in the high chair.

It’s possible that everyone thinks I am updating the status of an IT support ticket on my tablet based on the mobile text alert I just received.  Of course, it’s also possible that all of us are just eating breakfast while I’m also writing this blog post about IT.

However, as Joel Dobbs recently blogged, the IT times are a-changin’ — and faster than ever before since, thanks to the two-egg IT omelette of mobile technologies and cloud providers, IT no longer only happens in the IT department.  IT is everywhere now.

“There is a tendency to compartmentalize various types of IT,” Bruce Guptill recently blogged, “in order to make them more understandable and conform to budgeting practices.  But the core concept/theme/result of mobility really is ubiquity of IT — the same technology, services, and capabilities regardless of user and asset location.”

Regardless of how much you have embraced the consumerization of IT, some of your IT happens outside of your IT department, and some IT tasks are performed by people who not only don’t work in IT, but possibly don’t even work for your organization.

“While systems integration was once the big concern,” Judy Redman recently blogged, “today’s CIOs need to look to services integration.  Companies today need to obtain services from multiple vendors so that they can get best-of-breed solutions, cost efficiencies, and the flexibility needed to meet ever-changing and ever-more-demanding business needs.”

With its increasingly service-oriented and ubiquitous nature, it’s not too far-fetched to imagine that in the near future of IT, the patrons of a Wi-Fi-enabled diner could be your organization’s new IT department, serving your IT with a side of hash browns.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

The IT Consumerization Conundrum

Shadow IT and the New Prometheus

A Swift Kick in the AAS

The UX Factor

Are Cloud Providers the Bounty Hunters of IT?

The Cloud Security Paradox

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

Our Increasingly Data-Constructed World

Last week, I joined fellow Information Management bloggers Art Petty, Mark Smith, Bruce Guptill, and co-hosts Eric Kavanagh and Jim Ericson for a DM Radio discussion about the latest trends and innovations in the information management industry.

For my contribution to the discussion, I talked about the long-running macro trend underlying many trends and innovations, namely that our world is becoming, not just more data-driven, but increasingly data-constructed.

Physicist John Archibald Wheeler contemplated how the bit is a fundamental particle, which, although insubstantial, could be considered more fundamental than matter itself.  He summarized this viewpoint in his pithy phrase “It from Bit” explaining how: “every it — every particle, every field of force, even the space-time continuum itself — derives its function, its meaning, its very existence entirely — even if in some contexts indirectly — from the answers to yes-or-no questions, binary choices, bits.”

In other words, we could say that the physical world is conceived of in, and derived from, the non-physical world of data.

Although bringing data into the real world has historically also required constructing other physical things to deliver data to us, more of the things in the physical world are becoming directly digitized.  As just a few examples, consider how we’re progressing:

  • From audio delivered via vinyl records, audio tapes, CDs, and MP3 files (and other file formats) to Web-streaming audio
  • From video delivered via movie reels, video tapes, DVDs, and MP4 files (and other file formats) to Web-streaming video
  • From text delivered via printed newspapers, magazines, and books to websites, blogs, e-books, and other electronic texts

Furthermore, we continue to see more physical tools (e.g., calculators, alarm clocks, calendars, dictionaries) transforming into apps and data on our smart phones, tablets, and other mobile devices.  Essentially, in a world increasingly constructed of an invisible and intangible substance called data (perhaps the datum should be added to the periodic table of elements?), one of the few things that we see and touch are the screens of our mobile devices that make the invisible visible and the intangible tangible.

 

Bitrate, Lossy Audio, and Quantity over Quality

If our world is becoming increasingly data-constructed, does that mean people are becoming more concerned about data quality?

In a bit, 0.  In a word, no.  And that’s because, much to the dismay of those working in the data quality profession, most people do not care about the quality of their data unless it becomes bad enough for them to pay attention to — and complain about.

An excellent example is bitrate, which refers to the number of bits — or the amount of data — that are processed over a certain amount of time.  In his article Does Bitrate Really Make a Difference In My Music?, Whitson Gordon examined the common debate about lossless and lossy audio formats.

Using the example of ripping a track from a CD to a hard drive, a lossless format means that the track is not compressed to the point where any of its data is lost, retaining, for all intents and purposes, the same audio data quality as the original CD track.

By contrast, a lossy format compresses the track so that it takes up less space by intentionally removing some of its data, thereby reducing audio data quality.  Audiophiles often claim anything other than vinyl records sound lousy because they are so lossy.

However, like truth, beauty, and art, data quality can be said to be in the eyes — or the ears — of the beholder.  So, if your favorite music sounds good enough to you in MP3 file format, then not only do you not need those physical vinyl records, audio tapes, and CDs anymore, but since you consider MP3 files good enough, you will not pay any further attention to audio data quality.

Another, and less recent, example is the videotape format war waged during the 1970s and 1980s between Betamax and VHS, when Betamax was widely believed to provide superior video data quality.

But a blank Betamax tape allowed users to record up to two hours of high-quality video, whereas a VHS tape allowed users to record up to four hours of slightly lower quality video.  Consumers consistently chose quantity over quality — and especially since lower quality also meant a lower price.  Betamax tapes and machines remained more expensive based on the assumption that consumers would pay a premium for higher quality video.

The VHS victory demonstrated how people often choose quantity over quality, so it doesn’t always pay to have better data quality.

 

Redefining Structure in a Data-Constructed World

Another side effect of our increasingly data-constructed world is that it is challenging the traditional data management notion that data has to be structured before it can be used — especially within many traditional notions of business intelligence.

Physicist Niels Bohr suggested that understanding the structure of the atom requires changing our definition of understanding.

Since a lot of the recent Big Data craze consists of unstructured or semi-structured data, perhaps understanding how much structure data truly requires for business applications (e.g., sentiment analysis of social networking data) requires changing our definition of structuring.  At the very least, we have to accept the fact that the relational data model is no longer our only option.

Although I often blog about how data and the real world are not the same thing, as more physical things, as well as more aspects of our everyday lives, become directly digitized, it is becoming more difficult to differentiate physical reality from digital reality.

 

Related Posts

HoardaBytes and the Big Data Lebowski

Magic Elephants, Data Psychics, and Invisible Gorillas

Big Data el Memorioso

The Big Data Collider

Information Overload Revisited

Dot Collectors and Dot Connectors

WYSIWYG and WYSIATI

Plato’s Data

The Data Cold War

A Farscape Analogy for Data Quality

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • A Brave New Data World — A discussion about how data, data quality, data-driven decision making, and metadata quality no longer reside exclusively within the esoteric realm of data management — basically, everyone is a data geek now.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Data Myopia and Business Relativity

Since how data quality is defined has a significant impact on how data quality is perceived, measured, and managed, in this post I examine the two most prevalent perspectives on defining data quality, real-world alignment and fitness for the purpose of use, which respectively represent what I refer to as the danger of data myopia and the challenge of business relativity.

Real-World Alignment: The Danger of Data Myopia

Whether it’s an abstract description of real-world entities (i.e., master data) or an abstract description of real-world interactions (i.e., transaction data) among entities, data is an abstract description of reality.  The creation and maintenance of these abstract descriptions shapes the organization’s perception of the real world, which I philosophically pondered in my post Plato’s Data.

The inconvenient truth is that the real world is not the same thing as the digital worlds captured within our databases.

And, of course, creating and maintaining these digital worlds is no easy task, which is exactly the danger inherent with the real-world alignment definition of data quality — when the organization’s data quality efforts are focused on minimizing the digital distance between data and the constantly changing real world that data attempts to describe, it can lead to a hyper-focus on the data in isolation, otherwise known as data myopia.

Even if we create and maintain perfect real-world alignment, what value does high-quality data possess independent of its use?

Real-world alignment reflects the perspective of the data provider, and its advocates argue that providing a trusted source of data to the organization will be able to satisfy any and all business requirements, i.e., high-quality data should be fit to serve as the basis for every possible use.  Therefore, in theory, real-world alignment provides an objective data foundation independent of the subjective uses defined by the organization’s many data consumers.

However, providing the organization with a single system of record, a single version of the truth, a single view, a golden copy, or a consolidated repository of trusted data has long been the rallying cry and siren song of enterprise data warehousing (EDW), and more recently, of master data management (MDM).  Although these initiatives can provide significant business value, it is usually poor data quality that undermines the long-term success and sustainability of EDW and MDM implementations.

Perhaps the enterprise needs a Ulysses pact to protect it from believing in EDW or MDM as a miracle exception for data quality?

A significant challenge for the data provider perspective on data quality is that it is difficult to make a compelling business case on the basis of trusted data without direct connections to the specific business needs of data consumers, whose business, data, and technical requirements are often in conflict with one another.

In other words, real-world alignment does not necessarily guarantee business-world alignment.

So, if using real-world alignment as the definition of data quality has inherent dangers, we might be tempted to conclude that the fitness for the purpose of use definition of data quality is the better choice.  Unfortunately, that is not necessarily the case.

Fitness for the Purpose of Use: The Challenge of Business Relativity

Relativity.jpg

In M. C. Escher’s famous 1953 lithograph Relativity, although we observe multiple, and conflicting, perspectives of reality, from the individual perspective of each person, everything must appear normal, since they are all casually going about their daily activities.

I have always thought this is an apt analogy for the multiple business perspectives on data quality that exists within every organization.

Like truth, beauty, and art, data quality can be said to be in the eyes of the beholder, or when data quality is defined as fitness for the purpose of use — the eyes of the user.

Most data has both multiple uses and users.  Data of sufficient quality for one use or user may not be of sufficient quality for other uses and users.  These multiple, and often conflicting, perspectives are considered irrelevant from the perspective of an individual user, who just needs quality data to support their own business activities.

Therefore, the user (i.e., data consumer) perspective establishes a relative business context for data quality.

Whereas the real-world alignment definition of data quality can cause a data-myopic focus, the business-world alignment goal of the fitness for the purpose of use definition must contend with the daunting challenge of business relativity.  Most data has multiple data consumers, each with their own relative business context for data quality, making it difficult to balance the diverse data needs and divergent data quality perspectives within the conflicting, and rather Escher-like, reality of the organization.

The data consumer perspective on data quality is often the root cause of the data silo problem, the bane of successful enterprise data management prevalent in most organizations, where each data consumer maintains their own data silo, customized to be fit for the purpose of their own use.  Organizational culture and politics also play significant roles since data consumers legitimately fear that losing their data silos would revert the organization to a one-size-fits-all data provider perspective on data quality.

So, clearly the fitness for the purpose of use definition of data quality is not without its own considerable challenges to overcome.

How does your organization define data quality?

As I stated at the beginning of this post, how data quality is defined has a significant impact on how data quality is perceived, measured, and managed.  I have witnessed the data quality efforts of an organization struggle with, and at times fail because of, either the danger of data myopia or the challenge of business relativity — or, more often than not, some combination of both.

Although some would define real-world alignment as data quality and fitness for the purpose of use as information quality, I have found adding the nuance of data versus information only further complicates an organization’s data quality discussions.

But for now, I will just conclude a rather long (sorry about that) post by asking for reader feedback on this perennial debate.

How does your organization define data quality?  Please share your thoughts and experiences by posting a comment below.

The UX Factor

This blog post is sponsored by the Enterprise CIO Forum and HP.

In his book The Most Human Human, Brian Christian explained that “UX — short for User Experience — refers to the experience a given user has using a piece of software or technology, rather than the purely technical capacities of that device.”

But since its inception, the computer industry has been primarily concerned with technical capacities.  Computer advancements have followed the oft-cited Moore’s Law, a trend accurately described by Intel co-founder Gordon Moore in 1965, which states the number of transistors that can be placed inexpensively on an integrated circuit, thereby increasing processing speed and memory capacity, doubles approximately every two years.

However, as Christian explained, for a while in the computer industry, “an arms race between hardware and software created the odd situation that computers were getting exponentially faster but not faster at all to use, as software made ever-larger demands on systems resources, at a rate that matched and sometimes outpaced hardware improvements.”  This was sometimes called “Andy and Bill’s Law,” referring to Andy Grove of Intel and Bill Gates of Microsoft.  “What Andy giveth, Bill taketh away.”

But these advancements in computational power, along with increased network bandwidth, parallel processing frameworks (e.g., Hadoop), scalable and distributed models (e.g., cloud computing), and other advancements (e.g., in-memory technology) are making powerful technical capacities so much more commonplace, and so much less expensive, that the computer industry is responding to consumers demanding that the primary concern be user experience — hence the so-called Consumerization of IT.

“As computing technology moves increasingly toward mobile devices,” Christian noted, “product development becomes less about the raw computing horsepower and more about the overall design of the product and its fluidity, reactivity, and ease of use.”

David Snow and Alex Bakker have recently blogged about the challenges and opportunities facing enterprises and vendors with respect to the Bring Your Own Device (BYOD) movement, where more employees, and employers, are embracing mobile devices.

Although the old mantra of function over form is not getting replaced by form over function, form factor, interface design, and the many other aspects of User Experience are becoming the unrelenting UX Factor of the continuing consumerization trend.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

The Diderot Effect of New Technology

A Swift Kick in the AAS

Shadow IT and the New Prometheus

The IT Consumerization Conundrum

The IT Prime Directive of Business First Contact

Are Cloud Providers the Bounty Hunters of IT?

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

Commendable Comments (Part 12)

Since I officially launched this blog on March 13, 2009, that makes today the Third Blogiversary of OCDQ Blog!

So, absolutely without question, there is no better way to commemorate this milestone other than to also make this the 12th entry in my ongoing series for expressing my gratitude to my readers for their truly commendable comments on my blog posts.

 

Commendable Comments

On Big Data el Memorioso, Mark Troester commented:

“I think this helps illustrate that one size does not fit all.

You can’t take a singular approach to how you design for big data.  It’s all about identifying relevance and understanding that relevance can change over time.

There are certain situations where it makes sense to leverage all of the data, and now with high performance computing capabilities that include in-memory, in-DB and grid, it's possible to build and deploy rich models using all data in a short amount of time. Not only can you leverage rich models, but you can deploy a large number of models that leverage many variables so that you get optimal results.

On the other hand, there are situations where you need to filter out the extraneous information and the more intelligent you can be about identifying the relevant information the better.

The traditional approach is to grab the data, cleanse it, and land it somewhere before processing or analyzing the data.  We suggest that you leverage analytics up front to determine what data is relevant as it streams in, with relevance based on your organizational knowledge or context.  That helps you determine what data should be acted upon immediately, where it should be stored, etc.

And, of course, there are considerations about using visual analytic techniques to help you determine relevance and guide your analysis, but that’s an entire subject just on its own!”

On Data Governance Frameworks are like Jigsaw Puzzles, Gabriel Marcan commented:

“I agree (and like) the jigsaw puzzles metaphor.  I would like to make an observation though:

Can you really construct Data Governance one piece at a time?

I would argue you need to put together sets of pieces simultaneously, and to ensure early value, you might want to piece together the interesting / easy pieces first.

Hold on, that sounds like the typical jigsaw strategy anyway . . . :-)”

On Data Governance Frameworks are like Jigsaw Puzzles, Doug Newdick commented:

“I think that there are a number of more general lessons here.

In particular, the description of the issues with data governance sounds very like the issues with enterprise architecture.  In general, there are very few eureka moments in solving the business and IT issues plaguing enterprises.  These solutions are usually 10% inspiration, 90% perspiration in my experience.  What looks like genius or a sudden breakthrough is usually the result of a lot of hard work.

I also think that there is a wider Myth of the Framework at play too.

The myth is that if we just select the right framework then everything else will fall into place.  In reality, the selection of the framework is just the start of the real work that produces the results.  Frameworks don’t solve your problems, people solve your problems by the application of brain-power and sweat.

All frameworks do is take care of some of the heavy-lifting, i.e., the mundane foundational research and thinking activity that is not specific to your situation.

Unfortunately the myth of the framework is why many organizations think that choosing TOGAF will immediately solve their IT issues and are then disappointed when this doesn’t happen, when a more sensible approach might have garnered better long-term success.”

On Data Quality: Quo Vadimus?, Richard Jarvis commented:

“I agree with everything you’ve said, but there’s a much uglier truth about data quality that should also be discussed — the business benefit of NOT having a data quality program.

The unfortunate reality is that in a tight market, the last thing many decision makers want to be made public (internally or externally) is the truth.

In a company with data quality principles ingrained in day-to-day processes, and reporting handled independently, it becomes much harder to hide or reinterpret your falling market share.  Without these principles though, you’ll probably be able to pick your version of the truth from a stack of half a dozen, then spend your strategy meeting discussing which one is right instead of what you’re going to do about it.

What we’re talking about here is the difference between a Politician — who will smile at the camera and proudly announce 0.1% growth was a fantastic result given X, Y, and Z factors — and a Statistician who will endeavor to describe reality with minimal personal bias.

And the larger the organization, the more internal politics plays a part.  I believe a lot of the reluctance in investing in data quality initiatives could be traced back to this fear of being held truly accountable, regardless of it being in the best interests of the organization.  To build a data quality-centric culture, the change must be driven from the CEO down if it’s to succeed.”

On Data Quality: Quo Vadimus?, Peter Perera commented:

“The question: ‘Is Data Quality a Journey or a Destination?’ suggests that it is one or the other.

I agree with another comment that data quality is neither . . . or, I suppose, it could be both (the journey is the destination and the destination is the journey. They are one and the same.)

The quality of data (or anything for that matter) is something we experience.

Quality only radiates when someone is in the act of experiencing the data, and usually only when it is someone that matters.  This radiation decays over time, ranging from seconds or less to years or more.

The only problem with viewing data quality as radiation is that radiation can be measured by an instrument, but there is no such instrument to measure data quality.

We tend to confuse data qualities (which can be measured) and data quality (which cannot).

In the words of someone whose name I cannot recall: Quality is not job one. Being totally %@^#&$*% amazing is job one.The only thing I disagree with here is that being amazing is characterized as a job.

Data quality is not something we do to data.  It’s not a business initiative or project or job.  It’s not a discipline.  We need to distinguish between the pursuit (journey) of being amazing and actually being amazing (destination — but certainly not a final one).  To be amazing requires someone to be amazed.  We want data to be continuously amazing . . . to someone that matters, i.e., someone who uses and values the data a whole lot for an end that makes a material difference.

Come to think of it, the only prerequisite for data quality is being alive because that is the only way to experience it.  If you come across some data and have an amazed reaction to it and can make a difference using it, you cannot help but experience great data quality.  So if you are amazing people all the time with your data, then you are doing your data quality job very well.”

On Data Quality and Miracle Exceptions, Gordon Hamilton commented:

“Nicely delineated argument, Jim.  Successfully starting a data quality program seems to be a balance between getting started somewhere and determining where best to start.  The data quality problem is like a two-edged sword without a handle that is inflicting the death of a thousand cuts.

Data quality is indeed difficult to get a handle on.”

And since they generated so much great banter, please check out all of the commendable comments received by the blog posts There is No Such Thing as a Root Cause and You only get a Return from something you actually Invest in.

 

Thank You for Three Awesome Years

You are Awesome — which is why receiving your comments has been the most rewarding aspect of my blogging experience over the last three years.  Even if you have never posted a comment, you are still awesome — feel free to tell everyone I said so.

This entry in the series highlighted commendable comments on blog posts published between December 2011 and March 2012.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please continue commenting and stay tuned for future entries in the series.

Thank you for reading the Obsessive-Compulsive Data Quality blog for the last three years. Your readership is deeply appreciated.

 

Related Posts

Commendable Comments (Part 11)

Commendable Comments (Part 10) – The 300th OCDQ Blog Post

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

OCDQ Blog Bicentennial – The 200th OCDQ Blog Post

Commendable Comments (Part 9)

Commendable Comments (Part 8)

Commendable Comments (Part 7)

Commendable Comments (Part 6)

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

Commendable Comments (Part 4)

Commendable Comments (Part 3)

Commendable Comments (Part 2)

Commendable Comments (Part 1)

Data Quality and Big Data

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

This is Part 2 of 2 from my recent discussion with Tom Redman.  In this episode, Tom and I discuss data quality and big data, including if data quality matters less in larger data sets, if statistical outliers represent business insights or data quality issues, statistical sampling errors versus measurement calibration errors, mistaking signal for noise (i.e., good data for bad data), and whether or not the principles and practices of true “data scientists” will truly be embraced by an organization’s business leaders.

Dr. Thomas C. Redman (the “Data Doc”) is an innovator, advisor, and teacher.  He was first to extend quality principles to data and information in the late 80s.  Since then he has crystallized a body of tools, techniques, roadmaps and organizational insights that help organizations make order-of-magnitude improvements.

More recently Tom has developed keen insights into the nature of data and formulated the first comprehensive approach to “putting data to work.”  Taken together, these enable organizations to treat data as assets of virtually unlimited potential.

Tom has personally helped dozens of leaders and organizations better understand data and data quality and start their data programs.  He is a sought-after lecturer and the author of dozens of papers and four books.  The most recent, Data Driven: Profiting from Your Most Important Business Asset (Harvard Business Press, 2008) was a Library Journal best buy of 2008.

Prior to forming Navesink Consulting Group in 1996, Tom conceived the Data Quality Lab at AT&T Bell Laboratories in 1987 and led it until 1995.  Tom holds a Ph.D. in statistics from Florida State University. He holds two patents.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Data Driven

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

This is Part 1 of 2 from my recent discussion with Tom Redman.  In this episode, Tom and I discuss concepts from one of my favorite data quality books, which is his most recent book: Data Driven: Profiting from Your Most Important Business Asset.

Our discussion includes viewing data as an asset, an organization’s hierarchy of data needs, a simple model for culture change, and attempting to achieve the “single version of the truth” being marketed as a goal of master data management (MDM).

Dr. Thomas C. Redman (the “Data Doc”) is an innovator, advisor, and teacher.  He was first to extend quality principles to data and information in the late 80s.  Since then he has crystallized a body of tools, techniques, roadmaps and organizational insights that help organizations make order-of-magnitude improvements.

More recently Tom has developed keen insights into the nature of data and formulated the first comprehensive approach to “putting data to work.”  Taken together, these enable organizations to treat data as assets of virtually unlimited potential.

Tom has personally helped dozens of leaders and organizations better understand data and data quality and start their data programs.  He is a sought-after lecturer and the author of dozens of papers and four books.

Prior to forming Navesink Consulting Group in 1996, Tom conceived the Data Quality Lab at AT&T Bell Laboratories in 1987 and led it until 1995. Tom holds a Ph.D. in statistics from Florida State University.  He holds two patents.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Data Quality and Miracle Exceptions

“Reading superhero comic books with the benefit of a Ph.D. in physics,” James Kakalios explained in The Physics of Superheroes, “I have found many examples of the correct description and application of physics concepts.  Of course, the use of superpowers themselves involves direct violations of the known laws of physics, requiring a deliberate and willful suspension of disbelief.”

“However, many comics need only a single miracle exception — one extraordinary thing you have to buy into — and the rest that follows as the hero and the villain square off would be consistent with the principles of science.”

“Data Quality is all about . . .”

It is essential to foster a marketplace of ideas about data quality in which a diversity of viewpoints is freely shared without bias, where everyone is invited to get involved in discussions and debates and have an opportunity to hear what others have to offer.

However, one of my biggest pet peeves about the data quality industry is when I listen to analysts, vendors, consultants, and other practitioners discuss data quality challenges, I am often required to make a miracle exception for data quality.  In other words, I am given one extraordinary thing I have to buy into in order to be willing to buy their solution to all of my data quality problems.

These superhero comic book style stories usually open with a miracle exception telling me that “data quality is all about . . .”

Sometimes, the miracle exception is purchasing technology from the right magic quadrant.  Other times, the miracle exception is either following a comprehensive framework, or following the right methodology from the right expert within the right discipline (e.g., data modeling, business process management, information quality management, agile development, data governance, etc.).

But I am especially irritated by individuals who bash vendors for selling allegedly only reactive data cleansing tools, while selling their allegedly only proactive defect prevention methodology, as if we could avoid cleaning up the existing data quality issues, or we could shut down and restart our organizations, so that before another single datum is created or business activity is executed, everyone could learn how to “do things the right way” so that “the data will always be entered right, the first time, every time.”

Although these and other miracle exceptions do correctly describe the application of data quality concepts in isolation, by doing so, they also oversimplify the multifaceted complexity of data quality, requiring a deliberate and willful suspension of disbelief.

Miracle exceptions certainly make for more entertaining stories and more effective sales pitches, but oversimplifying complexity for the purposes of explaining your approach, or, even worse and sadly more common, preaching at people that your approach definitively solves their data quality problems, is nothing less than applying the principle of deus ex machina to data quality.

Data Quality and deus ex machina

Deus ex machina is a plot device whereby a seemingly unsolvable problem is suddenly and abruptly solved with the contrived and unexpected intervention of some new event, character, ability, or object.

This technique is often used in the marketing of data quality software and services, where the problem of poor data quality can seemingly be solved by a new event (e.g., creating a data governance council), a new character (e.g., hiring an expert consultant), a new ability (e.g., aligning data quality metrics with business insight), or a new object (e.g., purchasing a new data quality tool).

Now, don’t get me wrong.  I do believe various technologies and methodologies from numerous disciplines, as well as several core principles (e.g., communication, collaboration, and change management) are all important variables in the data quality equation, but I don’t believe that any particular variable can be taken in isolation and deified as the God Particle of data quality physics.

Data Quality is Not about One Extraordinary Thing

Data quality isn’t all about technology, nor is it all about methodology.  And data quality isn’t all about data cleansing, nor is it all about defect prevention.  Data quality is not about only one thing — no matter how extraordinary any one of its things may seem.

Battling the dark forces of poor data quality doesn’t require any superpowers, but it does require doing the hard daily work of continuously improving your data quality.  Data quality does not have a miracle exception, so please stop believing in one.

And for the love of high-quality data everywhere, please stop trying to sell us one.

Data Quality: Quo Vadimus?

Over the past week, an excellent meme has been making its way around the data quality blogosphere.  It all started, as many of the best data quality blogging memes do, with a post written by Henrik Liliendahl Sørensen.

In Turning a Blind Eye to Data Quality, Henrik blogged about how, as data quality practitioners, we are often amazed by the inconvenient truth that our organizations are capable of growing as a successful business even despite the fact that they often turn a blind eye to data quality by ignoring data quality issues and not following the data quality best practices that we advocate.

“The evidence about how poor data quality is costing enterprises huge sums of money has been out there for a long time,” Henrik explained.  “But business successes are made over and over again despite bad data.  There may be casualties, but the business goals are met anyway.  So, poor data quality is just something that makes the fight harder, not impossible.”

As data quality practitioners, we often don’t effectively sell the business benefits of data quality, but instead we often only talk about the negative aspects of not investing in data quality, which, as Henrik explained, is usually why business leaders turn a blind eye to data quality challenges.  Henrik concluded with the recommendation that when we are talking with business leaders, we need to focus on “smaller, but tangible, wins where data quality improvement and business efficiency goes hand in hand.”

 

Is Data Quality a Journey or a Destination?

Henrik’s blog post received excellent comments, which included a debate about whether data quality is a journey or a destination.

Garry Ure responded with his blog post Destination Unknown, in which he explained how “historically the quest for data quality was likened to a journey to convey the concept that you need to continue to work in order to maintain quality.”  But Garry also noted that sometimes when an organization does successfully ingrain data quality practices into day-to-day business operations, it can make it seem like data quality is a destination that the organization has finally reached.

Garry concluded data quality is “just one destination of many on a long and somewhat recursive journey.  I think the point is that there is no final destination, instead the journey becomes smoother, quicker, and more pleasant for those traveling.”

Bryan Larkin responded to Garry with the blog post Data Quality: Destinations Known, in which Bryan explained, “data quality should be a series of destinations where short journeys occur on the way to those destinations.  The reason is simple.  If we make it about one big destination or one big journey, we are not aligning our efforts with business goals.”

In order to do this, Bryan recommends that “we must identify specific projects that have tangible business benefits (directly to the bottom line — at least to begin with) that are quickly realized.  This means we are looking at less of a smooth journey and more of a sprint to a destination — to tackle a specific problem and show results in a short amount of time.  Most likely we’ll have a series of these sprints to destinations with little time to enjoy the journey.”

“While comprehensive data quality initiatives,” Bryan concluded, “are things we as practitioners want to see — in fact we build our world view around such — most enterprises (not all, mind you) are less interested in big initiatives and more interested in finite, specific, short projects that show results.  If we can get a series of these lined up, we can think of them more in terms of an overall comprehensive plan if we like — even a journey.  But most functional business staff will think of them in terms of the specific projects that affect them.”

The Latin phrase Quo Vadimus? translates into English as “Where are we going?”  When I ponder where data quality is going, and whether data quality is a journey or a destination, I am reminded of the words of T.S. Eliot:

“We must not cease from exploration and the end of all our exploring will be to arrive where we began and to know the place for the first time.”

We must not cease from exploring new ways to continuously improve our data quality and continuously put into practice our data governance principles, policies, and procedures, and the end of all our exploring will be to arrive where we began and to know, perhaps for the first time, the value of high-quality data to our enterprise’s continuing journey toward business success.

Magic Elephants, Data Psychics, and Invisible Gorillas

This blog post is sponsored by the Enterprise CIO Forum and HP.

A recent Forbes article predicts Big Data will be a $50 billion market by 2017, and Michael Friedenberg recently blogged how the rise of big data is generating buzz about Hadoop (which I call the Magic Elephant): “It certainly looks like the Holy Grail for organizing unstructured data, so it’s no wonder everyone is jumping on this bandwagon.  So get ready for Hadoopalooza 2012.”

John Burke recently blogged about the role of big data helping CIOs “figure out how to handle the new, the unusual, and the unexpected as an opportunity to focus more clearly on how to bring new levels of order to their traditional structured data.”

As I have previously blogged, many big data proponents (especially the Big Data Lebowski vendors selling Hadoop solutions) extol its virtues as if big data provides clairvoyant business insight, as if big data was the Data Psychic of the Information Age.

But a recent New York Times article opened with the story of a statistician working for a large retail chain being asked by his marketing colleagues: “If we wanted to figure out if a customer is pregnant, even if she didn’t want us to know, can you do that?” As Eric Siegel of Predictive Analytics World is quoted in the article, “we’re living through a golden age of behavioral research.  It’s amazing how much we can figure out about how people think now.”

So, perhaps calling big data psychic is not so far-fetched after all.  However, the potential of predictive analytics exemplifies why one of the biggest implications about big data is the data privacy concerns it raises.

Although it’s amazing (and scary) how much the Data Psychic can figure out about how we think (and work, shop, vote, love), it’s equally amazing (and scary) how much Psychology is figuring out about how we think, how we behave, and how we decide.

As I recently blogged about WYSIATI (“what you see is all there is” from Daniel Kahneman’s book Thinking, Fast and Slow), when you are using big data to make business decisions, what you are looking for can greatly influence what you are looking at (and vice versa).  But this natural human tendency could cause you miss the Invisible Gorilla walking across your screen.

If you are unfamiliar with that psychology experiment, which was created by Christopher Chabris and Daniel Simons, authors of the book The Invisible Gorilla: How Our Intuitions Deceive Us, then I recommend going to theinvisiblegorilla.com/videos.html. (By the way, before I was familiar with its premise, the first time I watched the video, I did not see the guy in the gorilla suit, and now when I watch the video, seeing the “invisible gorilla” distracts me, causing me to not count the number of passes correctly.)

In his book Incognito: The Secret Lives of the Brain, David Eagleman explained how our brain samples just a small bit of the physical world, making time-saving assumptions and seeing only as well as it needs to.  As our eyes interrogate the world, they optimize their strategy for the incoming data, arbitrating a battle between the conflicting information.  What we see is not what is really out there, but instead only a moment-by-moment version of which perception is winning over the others.  Our perception works not by building up bits of captured data, but instead by matching our expectations to the incoming sensory data.

I don’t doubt the Magic Elephants and Data Psychics provide the potential to envision and analyze almost anything happening within the complex and constantly changing business world — as well as the professional and personal lives of the people in it.

But I am concerned that information optimization driven by the biases of our human intuition and perception will only match our expectations to those fast-moving large volumes of various data, thereby causing us to not see many of the Invisible Gorillas.

Although this has always been a business intelligence concern, as technological advancements improve our data analytical tools, we must not lose sight of the fact that tools and data remain only as effective (and as beneficent) as the humans who wield them.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

Big Data el Memorioso

Neither the I Nor the T is Magic

Information Overload Revisited

HoardaBytes and the Big Data Lebowski

WYSIWYG and WYSIATI

The Speed of Decision

The Data-Decision Symphony

A Decision Needle in a Data Haystack

The Big Data Collider

Dot Collectors and Dot Connectors

DQ-View: Data Is as Data Does

Data, Information, and Knowledge Management

The Algebra of Collaboration

Most organizations have a vertical orientation, which creates a division of labor between functional areas where daily operations are carried out by people who have been trained in a specific type of business activity (e.g., Product Manufacturing, Marketing, Sales, Finance, Customer Service).  However, according to the most basic enterprise arithmetic, the sum of all vertical functions is one horizontal organization.  For example, in an organization with five vertical functions, 1 + 1 + 1 + 1 + 1 = 1 (and not 5).

Other times, it seems like division is the only mathematics the enterprise understands, creating perceived organizational divides based on geography (e.g., the Boston office versus the London office), or hierarchy (e.g., management versus front-line workers), or the Great Rift known as the Business versus IT.

However, enterprise-wide initiatives, such as data quality and data governance, require a cross-functional alignment reaching horizontally across the organization’s vertical functions, fostering a culture of collaboration combining a collective ownership with a shared responsibility and an individual accountability, requiring a branch of mathematics I call the Algebra of Collaboration.

For starters, as James Kakalios explained in his super book The Physics of Superheroes, “there is a trick to algebra: If one has an equation describing a true statement, such as 1 = 1, then one can add, subtract, multiply, or divide (excepting division by zero) the equation by any number we wish, and as long as we do it to both the left and right sides of the equation, the correctness of the equation is unchanged.  So if we add 2 to both sides of 1 = 1, we obtain 1 + 2 = 1 + 2 or 3 = 3, which is still a true statement.”

So, in the Algebra of Collaboration, we first establish one of the organization’s base equations, its true statements, for example, using the higher order collaborative equation that attempts to close the Great Rift otherwise known as the IT-Business Chasm:

Business = IT

Then we keep this base equation balanced by performing the same operation on both the left and right sides, for example:

Business + Data Quality + Data Governance = IT + Data Quality + Data Governance

The point is that everyone, regardless of their primary role or vertical function, must accept a shared responsibility for preventing data quality lapses and for responding appropriately to mitigate the associated business risks when issues occur.

Now, of course, as I blogged about in The Stakeholder’s Dilemma, this equation does not always remain perfectly balanced at all times.  The realities of the fiscal calendar effect, conflicting interests, and changing business priorities, will mean that the amount of resources (money, time, people) added to the equation by a particular stakeholder, vertical function, or group will vary.

But it’s important to remember the true statement that the base equation represents.  The trick of algebra is just one of the tricks of the collaboration trade.  Organizations that are successful with data quality and data governance view collaboration not just as a guiding principle, but also as a call to action in their daily practices.

Is your organization practicing the Algebra of Collaboration?

 

Related Posts

The Business versus IT—Tear down this wall!

The Road of Collaboration

The Collaborative Culture of Data Governance

Collaboration isn’t Brain Surgery

Finding Data Quality

Being Horizontally Vertical

The Year of the Datechnibus

Dot Collectors and Dot Connectors

No Datum is an Island of Serendip

The Three Most Important Letters in Data Governance

The Stakeholder’s Dilemma

Are you Building Bridges or Digging Moats?

Has Data Become a Four-Letter Word?

The Data Governance Oratorio

Video: Declaration of Data Governance

Data Love Song Mashup

Today is February 14 — Valentine’s Day — the annual celebration of enduring romance, where true love is publicly judged according to your willingness to purchase chocolate, roses, and extremely expensive jewelry, and privately judged in ways that nobody (and please, trust me when I say nobody) wants to see you post on Twitter, Facebook, YouTube, or your blog.

Valentine’s Day is for people in love to celebrate their love privately in whatever way works best for them.

But since your data needs love too, this blog post provides a mashup of love songs for your data.

Data Love Song Mashup

I’ve got sunshine on a cloud computing day
When it’s cold outside, I’ve got backups from the month of May
I guess you’d say, what can make me feel this way?
My data, my data, my data
Singing about my data
My data

My data’s so beautiful 
And I tell it every day
When I see your user interface
There’s not a thing that I would change
Because my data, you’re amazing
Just the way you are
You’re amazing data
Just the way you are

They say we’re young and we don’t know
We won’t find data quality issues until we grow
Well I don’t know if that is true
Because you got me, data
And data, I got you
I got you, data

Look into my eyes, and you will see
What my data means to me
Don’t tell me data quality is not worth trying for
Don’t tell me it’s not worth fighting for
You know it’s true
Everything I do, I do data quality for you

I can’t make you love data if you don’t
I can’t make your heart feel something it won’t

But there’s nothing you can do that can’t be done
Nothing you can sing that can’t be sung
Nothing you can make that can’t be made
All you need is love . . . for data
Love for data is all you need

Business people working hard all day and through the night
Their database queries searching for business insight
Some will win, some will lose
Some were born to sing the data quality blues
Oh, the need for business insight never ends
It goes on and on and on and on
Don’t stop believing
Hold on to that data loving feeling

Look at your data, I know its poor quality is showing
Look at your organization, you don’t know where it’s going
I don’t know much, but I know your data needs love too
And that may be all I need to know

Nothing compares to data quality, no worries or cares
Business regrets and decision mistakes, they’re memories made
But if you don’t continuously improve, how bittersweet that will taste
I wish nothing but the best for you
I wish nothing but the best for your data too
Don’t forget data quality, I beg, please remember I said
Sometimes quality lasts in data, but sometimes it hurts instead

 

Happy Valentine’s Day to you and yours

Happy Data Quality to you and your data

Decision Management Systems

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, I discuss decision management with James Taylor, author of the new book Decision Management Systems: A Practical Guide to Using Business Rules and Predictive Analytics.

James Taylor is the CEO of Decision Management Solutions, and the leading expert in Decision Management Systems, which are active participants in improving business results by applying business rules, predictive analytics, and optimization technologies to address the toughest issues facing businesses today, and changing the way organizations are doing business.

James Taylor has led Decision Management efforts for leading companies in insurance, banking, health management, and telecommunications.  Decision Management Solutions works with clients to improve their business by applying analytics and business rules technology to automate and improve decisions.  Clients range from start-ups and software companies to major North American insurers, a travel company, the health management division of a major healthcare company, one of Europe’s largest banks, and several major decision management technology vendors.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

A Swift Kick in the AAS

This blog post is sponsored by the Enterprise CIO Forum and HP.

Appending the phrase “as a Service” (AAS) to almost every word (e.g., Software, Platform, Infrastructure, Data, Analytics) has become increasing prevalent due to the world-wide-webification of IT by cloud computing and other consumerization trends.

Rick Blaisdell recently blogged about the benefits of the cloud, which include fully featured services, monthly subscription costs, 24/7 support, high availability, and financially-backed service level agreements.  “Look at the cloud,” Blaisdell recommended, “as a logical extension of your IT capabilities, and take advantage of all the benefits of cloud services.”

Judy Redman has blogged about how cloud computing is one of three IT delivery trends (along with agile development and composite applications) that are allowing IT leaders to reduce costs, deliver better applications faster, and provide results that are more aligned with, and more responsive to, the business.

And with more existing applications migrating to the cloud, it is all too easy to ponder whether these services raining down from the cloud forecast the end of the reign of the centralized IT department — and, perhaps by extension, the end of the reign of the traditional IT vendor that remains off-premises-resistant (i.e., vendors continuing to exclusively sell on-premises solutions, which they positively call enterprise-class solutions, but their customers often come to negatively call legacy applications).

However, “cloud (or public cloud at least) is not the only enabler,” Adrian Bridgwater recently blogged, explaining how a converged infrastructure acknowledges that “existing systems need to be consolidated and brought into line in a harmonious, interconnected, and interoperable way.  This is where private clouds (and/or a mix of hybrid clouds) come to the fore and a firm manages its own internal systems in a hyper-efficient manner.  From this point, we see IT infrastructure working to a) save money, b) run parallel with strategic business objectives for profit and growth, and c) become a business enabler in its own right.”

No matter how much of it is cloud-oriented (or public/private clouded), the future of IT is definitely going to be service-oriented.

Now, of course, the role of IT has always been to deliver to the enterprise a fast and agile business-enabling service.  But perhaps what is refreshingly new about the unrelenting “as a Service” trend is that it reminds the IT department of their prime directive, and it enables the enterprise to deliver to the IT industry as a whole a (sometimes sorely needed) Swift Kick in the AAS.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

Can Enterprise-Class Solutions Ever Deliver ROI?

Why does the sun never set on legacy applications?

Are Applications the La Brea Tar Pits for Data?

The Cloud Security Paradox

Are Cloud Providers the Bounty Hunters of IT?

The Partly Cloudy CIO

Shadow IT and the New Prometheus

The IT Consumerization Conundrum

The IT Prime Directive of Business First Contact

The IT Pendulum and the Federated Future of IT

HoardaBytes and the Big Data Lebowski

Gartnet Chat on Big Data 2.jpg

The recent #GartnerChat on Big Data was an excellent Twitter discussion about what I often refer to as the Seven Letter Tsunami of the data management industry, which as Gartner Research explains, although the term acknowledges the exponential growth, availability, and use of information in today’s data-rich landscape, big data is about more than just data volume.  Data variety (i.e., structured, semi-structured, and unstructured data, as well as other types, such as the sensor data emanating from the Internet of Things), and data velocity (i.e., how fast data is produced and how fast data must be processed to meet demand) are also key characteristics of the big challenges associated with the big buzzword that big data has become over the last year.

Since ours is an industry infatuated with buzzwords, Timo Elliott remarked “new terms arise because of new technology, not new business problems.  Big Data came from a need to name Hadoop [and other technologies now being relentlessly marketed as big data solutions], so anybody using big data to refer to business problems is quickly going to tie themselves in definitional knots.”

To which Mark Troester responded, “the hype of Hadoop is driving pressure on people to keep everything — but they ignore the difficulty in managing it.”  John Haddad then quipped that “big data is a hoarders dream,” which prompted Andy Bitterer to coin the term HoardaByte for measuring big data, and then asking, “Would the real Big Data Lebowski please stand up?”

HoardaBytes

Although it’s probably no surprise that a blogger with obsessive-compulsive in the title of his blog would like Bitterer’s new term, the fact is that whether you choose to measure it in terabytes, petabytes, exabytes, HoardaBytes, or how much reality bitterly bites, our organizations have been compulsively hoarding data for a long time.

And with silos replicating data as well as new data, and new types of data, being created and stored on a daily basis, managing all of the data is not only becoming impractical, but because we are too busy with the activity of trying to manage all of it, we are hoarding countless bytes of data without evaluating data usage, gathering data requirements, or planning for data archival.

The Big Data Lebowski

In The Big Lebowski, Jeff Lebowski (“The Dude”) is, in a classic data quality blunder caused by matching on person name only, mistakenly identified as millionaire Jeffrey Lebowski (“The Big Lebowski”) in an eccentric plot expected from a Coen brothers film, which, since its release in the late 1990s, has become a cult classic and inspired a religious following known as Dudeism.

Historically, a big part of the problem in our industry has been the fact that the word “data” is prevalent in the names we have given industry disciplines and enterprise information initiatives.  For example, data architecture, data quality, data integration, data migration, data warehousing, master data management, and data governance — to name but a few.

However, all this achieved was to perpetuate the mistaken identification of data management as an esoteric technical activity that played little more than a minor, supporting, and often uncredited, role within the business activities of our organizations.

But since the late 1990s, there has been a shift in the perception of data.  The real data deluge has not been the rising volume, variety, and velocity of data, but instead the rising awareness of the big impact that data has on nearly every aspect of our professional and personal lives.  In this brave new data world, companies like Google and Facebook have built business empires mostly out of our own personal data, which is why, like it or not, as individuals, we must accept that we are all data geeks now.

All of the hype about Big Data is missing the point.  The reality is that Data is Big — meaning that data has now so thoroughly pervaded mainstream culture that data has gone beyond being just a cult classic for the data management profession, and is now inspiring an almost religious following that we could call Dataism.

The Data must Abide

“The Dude abides.  I don’t know about you, but I take comfort in that,” remarked The Stranger in The Big Lebowski.

The Data must also abide.  And the Data must abide both the Business and the Individual.  The Data abides the Business if data proves useful to our business activities.  The Data abides the Individual if data protects the privacy of our personal activities.

The Data abides.  I don’t know about you, but I would take more comfort in that than in any solutions The Stranger Salesperson wants to sell me that utilize an eccentric sales pitch involving HoardaBytes and the Big Data Lebowski.

 

Related Posts