Talking Business about the Weather

Businesses of all sizes are always looking for ways to increase revenue, decrease costs, and operate more efficiently.  When I talk with midsize business owners, I hear the typical questions.  Should we hire a developer to update our website and improve our SEO rankings?  Should we invest less money in traditional advertising and invest more time in social media?  After discussing these and other business topics for a while, we drift into that standard conversational filler — talking about the weather.

But since I am always interested in analyzing data from as many different perspectives as possible, when I talk about the weather, I ask midsize business owners how much of a variable the weather plays in their business.  Does the weather affect the number of customers that visit your business on a daily basis?  Do customers purchase different items when the weather is good versus bad?

I usually receive quick responses, but when I ask if those responses were based on analyzing sales data alongside weather data, the answer is usually no, which is understandable since businesses are successful when they can focus on their core competencies, and for most businesses, analytics is not a core competency.  The demands of daily operations often prevent midsize businesses from stepping back and looking at things differently, like whether or not there’s a hidden connection between weather and sales.

One of my favorite books is Freakonomics: A Rogue Economist Explores the Hidden Side of Everything by Steven Levitt and Stephen Dubner.  The book, as well as its sequel, podcast, and movie, provides good examples of one of the common challenges facing data science, and more specifically predictive analytics since its predictions often seem counterintuitive to business leaders, whose intuition is rightfully based on their business expertise, which has guided their business success to date.  The reality is that even organizations that pride themselves on being data driven naturally resist any counterintuitive insights found in their data.

Dubner was recently interviewed by Crysta Anderson about how organizations can find insights in their data if they are willing and able to ask good questions.  Of course, it’s not always easy to determine what a good question would be.  But sometimes something as simple as talking about the weather when you’re talking business could lead to a meaningful business insight.

 

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.

 

Solvency II and Data Quality

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, Ken O’Connor and I discuss the Solvency II standards for data quality, and how its European insurance regulatory requirement of “complete, appropriate, and accurate” data represents common sense standards for all businesses.

Ken O’Connor is an independent data consultant with over 30 years of hands-on experience in the field, specializing in helping organizations meet the data quality management challenges presented by data-intensive programs such as data conversions, data migrations, data population, and regulatory compliance such as Solvency II, Basel II / III, Anti-Money Laundering, the Foreign Account Tax Compliance Act (FATCA), and the Dodd–Frank Wall Street Reform and Consumer Protection Act.

Ken O’Connor also provides practical data quality and data governance advice on his popular blog at: kenoconnordata.com

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Pitching Perfect Data Quality

In my previous post, I used a baseball metaphor to explain why we should strive for a quality start to our business activities by starting them off with good data quality, thereby giving our organization a better chance to succeed.

Since it’s a beautiful week for baseball metaphors, let’s post two!  (My apologies to Ernie Banks.)

If good data quality gives our organization a better chance to succeed, then it seems logical to assume that perfect data quality would give our organization the best chance to succeed.  However, as Yogi Berra said: “If the world were perfect, it wouldn’t be.”

My previous baseball metaphor was based on a statistic that measured how well a starting pitcher performs during a game.  The best possible performance of a starting pitcher is called a perfect game, when nine innings are perfectly completed by retiring the minimum of 27 opposing batters without allowing any hits, walks, hit batsmen, or batters reaching base due to a fielding error.

Although a lot of buzz is generated when a pitcher gets close to pitching a perfect game (e.g., usually after five perfect innings, it’s all the game’s announcers will talk about), during the 143 years of Major League Baseball history, during which approximately 200,000 games have been played, there have been only 20 perfect games, making it one of the rarest statistical events in baseball.

When a pitcher loses the chance of pitching a perfect game, does his team forfeit the game?  No, of course not.  Because the pitcher’s goal is not pitching perfectly.  The pitcher’s (and every other player’s) goal is helping the team win the game.

This is why I have never been a fan of anyone who is pitching perfect data quality, i.e., anyone advocating data perfection as the organization’s goal.  The organization’s goal is business success.  Data quality has a role to play, but claiming business success is impossible without having perfect data quality is like claiming winning in baseball is impossible without pitching a perfect game.

 

Related Posts

DQ-View: Baseball and Data Quality

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

Data Quality and The Middle Way

There is No Such Thing as a Root Cause

OCDQ Radio - The Johari Window of Data Quality

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

Quality Starts and Data Quality

This past week was the beginning of the 2012 Major League Baseball (MLB) season.  Since its data is mostly transaction data describing the statistical events of games played, baseball has long been a sport obsessed with statistics.  Baseball statisticians slice and dice every aspect of past games attempting to discover trends that could predict what is likely to happen in future games.

There are too many variables involved in determining which team will win a particular game to be able to choose a single variable that predicts game results.  But a few key statistics are cited by baseball analysts as general guidelines of a team’s potential to win.

One such statistic is a quality start, which is defined as a game in which a team’s starting pitcher completes at least six innings and permits no more than three earned runs.  Of course, a so-called quality start is no guarantee that the starting pitcher’s team will win the game.  But the relative reliability of the statistic to predict a game’s result causes some baseball analysts to refer to a loss suffered by a pitcher in a quality start as a tough loss and a win earned by a pitcher in a non-quality start as a cheap win.

There are too many variables involved in determining if a particular business activity will succeed to be able to choose a single variable that predicts business results.  But data quality is one of the general guidelines of an organization’s potential to succeed.

As Henrik Liliendahl Sørensen blogged, organizations are capable of achieving success with their business activities despite bad data quality, which we could call the business equivalent of cheap wins.  And organizations are also capable of suffering failure with their business activities despite good data quality, which we could call the business equivalent of tough losses.

So just like a quality start is no guarantee of a win in baseball, good data quality is no guarantee of a success in business.

But perhaps the relative reliability of data quality to predict business results should influence us to at least strive for a quality start to our business activities by starting them off with good data quality, thereby giving our organization a better chance to succeed.

 

Related Posts

DQ-View: Baseball and Data Quality

Poor Quality Data Sucks

Fantasy League Data Quality

There is No Such Thing as a Root Cause

Data Quality: Quo Vadimus?

OCDQ Radio - The Johari Window of Data Quality

OCDQ Radio - Redefining Data Quality

OCDQ Radio - The Blue Box of Information Quality

OCDQ Radio - Studying Data Quality

OCDQ Radio - Organizing for Data Quality

Will Big Data be Blinded by Data Science?

All of the hype about Big Data is also causing quite the hullabaloo about hiring Data Scientists in order to help your organization derive business value from big data analytics.  But even though we are still in the hype and hullabaloo stages, these unrelenting trends are starting to rightfully draw the attention of businesses of all sizes.  After all, the key word in big data isn’t big, because, in our increasing data-constructed world, big data is no longer just for big companies and high-tech firms.

And since the key word in data scientist isn’t data, in this post I want to focus on the second word in today’s hottest job title.

When I think of a scientist of any kind, I immediately think of the scientific method, which has been the standard operating procedure of scientific discovery since the 17th century.  First, you define a question, gather some initial data, and form a hypothesis, which is some idea about how to answer your question.  Next, you perform an experiment to test the hypothesis, during which more data is collected.  Then, you analyze the experimental data and evaluate your results.  Whether or not the experiment confirmed or contradicted your hypothesis, you do the same thing — repeat the experiment.  Because a hypothesis can only be promoted to a theory after repeated experimentation (including by others) consistently produces the same result.

During experimentation, failure happens just as often as, if not more often than, success.  However, both failure and success have long played an important role in scientific discovery because progress in either direction is still progress.

Therefore, experimentation is an essential component of scientific discovery — and data science is certainly no exception.

“Designed experiments,” Melinda Thielbar recently blogged, “is where we’ll make our next big leap for data science.”  I agree, but with the notable exception of A/B testing in marketing, most business activities generally don’t embrace data experimentation.

“The purpose of science,” Tom Redman recently explained, “is to discover fundamental truths about the universe.  But we don’t run our businesses to discover fundamental truths.  We run our businesses to serve a customer, gain marketplace advantage, or make money.”  In other words, the commercial application of science has more to do with commerce than it does with science.

One example of the challenges inherent in the commercial application of science is the misconception that predictive analytics can predict what is going to happen with certainty.  When instead, what it actually does is predict some of the possible things that could happen with a certain probability.  Although predictive analytics can be a valuable tool for many business activities, especially decision making, as Steve Miller recently blogged, most of us are not good at using probabilities to make decisions.

So, with apologies to Thomas Dolby, I can’t help but wonder, will big data be blinded by data science?  Will the business leaders being told to hire data scientists to derive business value from big data analytics be blind to what data science tries to show them?

 

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.

 

The Data Governance Imperative

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, Steve Sarsfield and I discuss how data governance is about changing the hearts and minds of your company to see the value of data quality, the characteristics of a data champion, and creating effective data quality scorecards.

Steve Sarsfield is a leading author and expert in data quality and data governance.  His book The Data Governance Imperative is a comprehensive exploration of data governance focusing on the business perspectives that are important to data champions, front-office employees, and executives.  He runs the Data Governance and Data Quality Insider, which is an award-winning and world-recognized blog.  Steve Sarsfield is the Product Marketing Manager for Data Governance and Data Quality at Talend.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

What is Weighing Down your Data?

On July 21, 1969, Neil Armstrong spoke the instantly famous words “that’s one small step for man, one giant leap for mankind” as he stepped off the ladder of the Apollo Lunar Module and became the first human being to walk on the surface of the Moon.

In addition to its many other, and more significant, scientific milestones, the Moon landing provided an excellent demonstration of three related, and often misunderstood, scientific concepts: mass, weight, and gravity.

Mass is an intrinsic property of matter, based on the atomic composition of a given object, such as your body for example, which means your mass would therefore remain the same regardless of whether you were walking on the surface of the Moon or Earth.

Weight is not an intrinsic property of matter, but is instead a gravitational force acting on matter.  Because the gravitational force of the Moon is less than the gravitational force of the Earth, you would weigh less on the Moon than you weigh on the Earth.  So, just like Neil Armstrong, your one small step on the surface of the Moon could quite literally become a giant leap.

Using these concepts metaphorically, mass is an intrinsic property of data, and perhaps a way to represent objective data quality, whereas weight is a gravitational force acting on data, and perhaps a way to represent subjective data quality.

Since most data can not escape the gravity of its application, most of what we refer to as data silos are actually application silos because data and applications become tightly coupled due to the strong gravitational force that an application exerts on its data.

Now, of course, an application can exert a strong gravitational force for a strong business reason (e.g., protecting sensitive data), and not, as we often assume by default, for a weak business reason (e.g., protecting corporate political power).

Although you probably don’t view your applications as something that is weighing down your data, and you probably also resist the feeling of weightlessness that can be caused by openly sharing your data, it’s worth considering that whether or not your data truly enables your organization to take giant leaps, not just small steps, depends on the gravitational forces acting on your data.

What is weighing down your data could also be weighing down your organization.

 

Related Posts

Data Myopia and Business Relativity

Are Applications the La Brea Tar Pits for Data?

Hell is other people’s data

My Own Private Data

No Datum is an Island of Serendip

Turning Data Silos into Glass Houses

Sharing Data

The Data Outhouse

The Good Data

Beyond a “Single Version of the Truth”

Serving IT with a Side of Hash Browns

This blog post is sponsored by the Enterprise CIO Forum and HP.

Since it’s where I started my career, I often ponder what it would be like to work in the IT department today.  This morning, instead of sitting in a cubicle with no window view other than the one Bill Gates gave us, I’m sitting in a booth by a real window, albeit one with a partially obstructed view of the parking lot, at a diner eating a two-egg omelette with a side of hash browns.

But nowadays, it’s possible that I’m still sitting amongst my fellow IT workers.  Perhaps the older gentleman to my left is verifying last night’s database load using his laptop.  Maybe the younger woman to my right is talking into her Bluetooth earpiece with a business analyst working on an ad hoc report.  And the couple in the corner could be struggling to understand the technology requirements of the C-level executive they’re meeting with, who’s now vocalizing his displeasure about sitting in the high chair.

It’s possible that everyone thinks I am updating the status of an IT support ticket on my tablet based on the mobile text alert I just received.  Of course, it’s also possible that all of us are just eating breakfast while I’m also writing this blog post about IT.

However, as Joel Dobbs recently blogged, the IT times are a-changin’ — and faster than ever before since, thanks to the two-egg IT omelette of mobile technologies and cloud providers, IT no longer only happens in the IT department.  IT is everywhere now.

“There is a tendency to compartmentalize various types of IT,” Bruce Guptill recently blogged, “in order to make them more understandable and conform to budgeting practices.  But the core concept/theme/result of mobility really is ubiquity of IT — the same technology, services, and capabilities regardless of user and asset location.”

Regardless of how much you have embraced the consumerization of IT, some of your IT happens outside of your IT department, and some IT tasks are performed by people who not only don’t work in IT, but possibly don’t even work for your organization.

“While systems integration was once the big concern,” Judy Redman recently blogged, “today’s CIOs need to look to services integration.  Companies today need to obtain services from multiple vendors so that they can get best-of-breed solutions, cost efficiencies, and the flexibility needed to meet ever-changing and ever-more-demanding business needs.”

With its increasingly service-oriented and ubiquitous nature, it’s not too far-fetched to imagine that in the near future of IT, the patrons of a Wi-Fi-enabled diner could be your organization’s new IT department, serving your IT with a side of hash browns.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

The IT Consumerization Conundrum

Shadow IT and the New Prometheus

A Swift Kick in the AAS

The UX Factor

Are Cloud Providers the Bounty Hunters of IT?

The Cloud Security Paradox

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

Our Increasingly Data-Constructed World

Last week, I joined fellow Information Management bloggers Art Petty, Mark Smith, Bruce Guptill, and co-hosts Eric Kavanagh and Jim Ericson for a DM Radio discussion about the latest trends and innovations in the information management industry.

For my contribution to the discussion, I talked about the long-running macro trend underlying many trends and innovations, namely that our world is becoming, not just more data-driven, but increasingly data-constructed.

Physicist John Archibald Wheeler contemplated how the bit is a fundamental particle, which, although insubstantial, could be considered more fundamental than matter itself.  He summarized this viewpoint in his pithy phrase “It from Bit” explaining how: “every it — every particle, every field of force, even the space-time continuum itself — derives its function, its meaning, its very existence entirely — even if in some contexts indirectly — from the answers to yes-or-no questions, binary choices, bits.”

In other words, we could say that the physical world is conceived of in, and derived from, the non-physical world of data.

Although bringing data into the real world has historically also required constructing other physical things to deliver data to us, more of the things in the physical world are becoming directly digitized.  As just a few examples, consider how we’re progressing:

  • From audio delivered via vinyl records, audio tapes, CDs, and MP3 files (and other file formats) to Web-streaming audio
  • From video delivered via movie reels, video tapes, DVDs, and MP4 files (and other file formats) to Web-streaming video
  • From text delivered via printed newspapers, magazines, and books to websites, blogs, e-books, and other electronic texts

Furthermore, we continue to see more physical tools (e.g., calculators, alarm clocks, calendars, dictionaries) transforming into apps and data on our smart phones, tablets, and other mobile devices.  Essentially, in a world increasingly constructed of an invisible and intangible substance called data (perhaps the datum should be added to the periodic table of elements?), one of the few things that we see and touch are the screens of our mobile devices that make the invisible visible and the intangible tangible.

 

Bitrate, Lossy Audio, and Quantity over Quality

If our world is becoming increasingly data-constructed, does that mean people are becoming more concerned about data quality?

In a bit, 0.  In a word, no.  And that’s because, much to the dismay of those working in the data quality profession, most people do not care about the quality of their data unless it becomes bad enough for them to pay attention to — and complain about.

An excellent example is bitrate, which refers to the number of bits — or the amount of data — that are processed over a certain amount of time.  In his article Does Bitrate Really Make a Difference In My Music?, Whitson Gordon examined the common debate about lossless and lossy audio formats.

Using the example of ripping a track from a CD to a hard drive, a lossless format means that the track is not compressed to the point where any of its data is lost, retaining, for all intents and purposes, the same audio data quality as the original CD track.

By contrast, a lossy format compresses the track so that it takes up less space by intentionally removing some of its data, thereby reducing audio data quality.  Audiophiles often claim anything other than vinyl records sound lousy because they are so lossy.

However, like truth, beauty, and art, data quality can be said to be in the eyes — or the ears — of the beholder.  So, if your favorite music sounds good enough to you in MP3 file format, then not only do you not need those physical vinyl records, audio tapes, and CDs anymore, but since you consider MP3 files good enough, you will not pay any further attention to audio data quality.

Another, and less recent, example is the videotape format war waged during the 1970s and 1980s between Betamax and VHS, when Betamax was widely believed to provide superior video data quality.

But a blank Betamax tape allowed users to record up to two hours of high-quality video, whereas a VHS tape allowed users to record up to four hours of slightly lower quality video.  Consumers consistently chose quantity over quality — and especially since lower quality also meant a lower price.  Betamax tapes and machines remained more expensive based on the assumption that consumers would pay a premium for higher quality video.

The VHS victory demonstrated how people often choose quantity over quality, so it doesn’t always pay to have better data quality.

 

Redefining Structure in a Data-Constructed World

Another side effect of our increasingly data-constructed world is that it is challenging the traditional data management notion that data has to be structured before it can be used — especially within many traditional notions of business intelligence.

Physicist Niels Bohr suggested that understanding the structure of the atom requires changing our definition of understanding.

Since a lot of the recent Big Data craze consists of unstructured or semi-structured data, perhaps understanding how much structure data truly requires for business applications (e.g., sentiment analysis of social networking data) requires changing our definition of structuring.  At the very least, we have to accept the fact that the relational data model is no longer our only option.

Although I often blog about how data and the real world are not the same thing, as more physical things, as well as more aspects of our everyday lives, become directly digitized, it is becoming more difficult to differentiate physical reality from digital reality.

 

Related Posts

HoardaBytes and the Big Data Lebowski

Magic Elephants, Data Psychics, and Invisible Gorillas

Big Data el Memorioso

The Big Data Collider

Information Overload Revisited

Dot Collectors and Dot Connectors

WYSIWYG and WYSIATI

Plato’s Data

The Data Cold War

A Farscape Analogy for Data Quality

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • A Brave New Data World — A discussion about how data, data quality, data-driven decision making, and metadata quality no longer reside exclusively within the esoteric realm of data management — basically, everyone is a data geek now.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Data Myopia and Business Relativity

Since how data quality is defined has a significant impact on how data quality is perceived, measured, and managed, in this post I examine the two most prevalent perspectives on defining data quality, real-world alignment and fitness for the purpose of use, which respectively represent what I refer to as the danger of data myopia and the challenge of business relativity.

Real-World Alignment: The Danger of Data Myopia

Whether it’s an abstract description of real-world entities (i.e., master data) or an abstract description of real-world interactions (i.e., transaction data) among entities, data is an abstract description of reality.  The creation and maintenance of these abstract descriptions shapes the organization’s perception of the real world, which I philosophically pondered in my post Plato’s Data.

The inconvenient truth is that the real world is not the same thing as the digital worlds captured within our databases.

And, of course, creating and maintaining these digital worlds is no easy task, which is exactly the danger inherent with the real-world alignment definition of data quality — when the organization’s data quality efforts are focused on minimizing the digital distance between data and the constantly changing real world that data attempts to describe, it can lead to a hyper-focus on the data in isolation, otherwise known as data myopia.

Even if we create and maintain perfect real-world alignment, what value does high-quality data possess independent of its use?

Real-world alignment reflects the perspective of the data provider, and its advocates argue that providing a trusted source of data to the organization will be able to satisfy any and all business requirements, i.e., high-quality data should be fit to serve as the basis for every possible use.  Therefore, in theory, real-world alignment provides an objective data foundation independent of the subjective uses defined by the organization’s many data consumers.

However, providing the organization with a single system of record, a single version of the truth, a single view, a golden copy, or a consolidated repository of trusted data has long been the rallying cry and siren song of enterprise data warehousing (EDW), and more recently, of master data management (MDM).  Although these initiatives can provide significant business value, it is usually poor data quality that undermines the long-term success and sustainability of EDW and MDM implementations.

Perhaps the enterprise needs a Ulysses pact to protect it from believing in EDW or MDM as a miracle exception for data quality?

A significant challenge for the data provider perspective on data quality is that it is difficult to make a compelling business case on the basis of trusted data without direct connections to the specific business needs of data consumers, whose business, data, and technical requirements are often in conflict with one another.

In other words, real-world alignment does not necessarily guarantee business-world alignment.

So, if using real-world alignment as the definition of data quality has inherent dangers, we might be tempted to conclude that the fitness for the purpose of use definition of data quality is the better choice.  Unfortunately, that is not necessarily the case.

Fitness for the Purpose of Use: The Challenge of Business Relativity

Relativity.jpg

In M. C. Escher’s famous 1953 lithograph Relativity, although we observe multiple, and conflicting, perspectives of reality, from the individual perspective of each person, everything must appear normal, since they are all casually going about their daily activities.

I have always thought this is an apt analogy for the multiple business perspectives on data quality that exists within every organization.

Like truth, beauty, and art, data quality can be said to be in the eyes of the beholder, or when data quality is defined as fitness for the purpose of use — the eyes of the user.

Most data has both multiple uses and users.  Data of sufficient quality for one use or user may not be of sufficient quality for other uses and users.  These multiple, and often conflicting, perspectives are considered irrelevant from the perspective of an individual user, who just needs quality data to support their own business activities.

Therefore, the user (i.e., data consumer) perspective establishes a relative business context for data quality.

Whereas the real-world alignment definition of data quality can cause a data-myopic focus, the business-world alignment goal of the fitness for the purpose of use definition must contend with the daunting challenge of business relativity.  Most data has multiple data consumers, each with their own relative business context for data quality, making it difficult to balance the diverse data needs and divergent data quality perspectives within the conflicting, and rather Escher-like, reality of the organization.

The data consumer perspective on data quality is often the root cause of the data silo problem, the bane of successful enterprise data management prevalent in most organizations, where each data consumer maintains their own data silo, customized to be fit for the purpose of their own use.  Organizational culture and politics also play significant roles since data consumers legitimately fear that losing their data silos would revert the organization to a one-size-fits-all data provider perspective on data quality.

So, clearly the fitness for the purpose of use definition of data quality is not without its own considerable challenges to overcome.

How does your organization define data quality?

As I stated at the beginning of this post, how data quality is defined has a significant impact on how data quality is perceived, measured, and managed.  I have witnessed the data quality efforts of an organization struggle with, and at times fail because of, either the danger of data myopia or the challenge of business relativity — or, more often than not, some combination of both.

Although some would define real-world alignment as data quality and fitness for the purpose of use as information quality, I have found adding the nuance of data versus information only further complicates an organization’s data quality discussions.

But for now, I will just conclude a rather long (sorry about that) post by asking for reader feedback on this perennial debate.

How does your organization define data quality?  Please share your thoughts and experiences by posting a comment below.

The UX Factor

This blog post is sponsored by the Enterprise CIO Forum and HP.

In his book The Most Human Human, Brian Christian explained that “UX — short for User Experience — refers to the experience a given user has using a piece of software or technology, rather than the purely technical capacities of that device.”

But since its inception, the computer industry has been primarily concerned with technical capacities.  Computer advancements have followed the oft-cited Moore’s Law, a trend accurately described by Intel co-founder Gordon Moore in 1965, which states the number of transistors that can be placed inexpensively on an integrated circuit, thereby increasing processing speed and memory capacity, doubles approximately every two years.

However, as Christian explained, for a while in the computer industry, “an arms race between hardware and software created the odd situation that computers were getting exponentially faster but not faster at all to use, as software made ever-larger demands on systems resources, at a rate that matched and sometimes outpaced hardware improvements.”  This was sometimes called “Andy and Bill’s Law,” referring to Andy Grove of Intel and Bill Gates of Microsoft.  “What Andy giveth, Bill taketh away.”

But these advancements in computational power, along with increased network bandwidth, parallel processing frameworks (e.g., Hadoop), scalable and distributed models (e.g., cloud computing), and other advancements (e.g., in-memory technology) are making powerful technical capacities so much more commonplace, and so much less expensive, that the computer industry is responding to consumers demanding that the primary concern be user experience — hence the so-called Consumerization of IT.

“As computing technology moves increasingly toward mobile devices,” Christian noted, “product development becomes less about the raw computing horsepower and more about the overall design of the product and its fluidity, reactivity, and ease of use.”

David Snow and Alex Bakker have recently blogged about the challenges and opportunities facing enterprises and vendors with respect to the Bring Your Own Device (BYOD) movement, where more employees, and employers, are embracing mobile devices.

Although the old mantra of function over form is not getting replaced by form over function, form factor, interface design, and the many other aspects of User Experience are becoming the unrelenting UX Factor of the continuing consumerization trend.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

The Diderot Effect of New Technology

A Swift Kick in the AAS

Shadow IT and the New Prometheus

The IT Consumerization Conundrum

The IT Prime Directive of Business First Contact

Are Cloud Providers the Bounty Hunters of IT?

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

Commendable Comments (Part 12)

Since I officially launched this blog on March 13, 2009, that makes today the Third Blogiversary of OCDQ Blog!

So, absolutely without question, there is no better way to commemorate this milestone other than to also make this the 12th entry in my ongoing series for expressing my gratitude to my readers for their truly commendable comments on my blog posts.

 

Commendable Comments

On Big Data el Memorioso, Mark Troester commented:

“I think this helps illustrate that one size does not fit all.

You can’t take a singular approach to how you design for big data.  It’s all about identifying relevance and understanding that relevance can change over time.

There are certain situations where it makes sense to leverage all of the data, and now with high performance computing capabilities that include in-memory, in-DB and grid, it's possible to build and deploy rich models using all data in a short amount of time. Not only can you leverage rich models, but you can deploy a large number of models that leverage many variables so that you get optimal results.

On the other hand, there are situations where you need to filter out the extraneous information and the more intelligent you can be about identifying the relevant information the better.

The traditional approach is to grab the data, cleanse it, and land it somewhere before processing or analyzing the data.  We suggest that you leverage analytics up front to determine what data is relevant as it streams in, with relevance based on your organizational knowledge or context.  That helps you determine what data should be acted upon immediately, where it should be stored, etc.

And, of course, there are considerations about using visual analytic techniques to help you determine relevance and guide your analysis, but that’s an entire subject just on its own!”

On Data Governance Frameworks are like Jigsaw Puzzles, Gabriel Marcan commented:

“I agree (and like) the jigsaw puzzles metaphor.  I would like to make an observation though:

Can you really construct Data Governance one piece at a time?

I would argue you need to put together sets of pieces simultaneously, and to ensure early value, you might want to piece together the interesting / easy pieces first.

Hold on, that sounds like the typical jigsaw strategy anyway . . . :-)”

On Data Governance Frameworks are like Jigsaw Puzzles, Doug Newdick commented:

“I think that there are a number of more general lessons here.

In particular, the description of the issues with data governance sounds very like the issues with enterprise architecture.  In general, there are very few eureka moments in solving the business and IT issues plaguing enterprises.  These solutions are usually 10% inspiration, 90% perspiration in my experience.  What looks like genius or a sudden breakthrough is usually the result of a lot of hard work.

I also think that there is a wider Myth of the Framework at play too.

The myth is that if we just select the right framework then everything else will fall into place.  In reality, the selection of the framework is just the start of the real work that produces the results.  Frameworks don’t solve your problems, people solve your problems by the application of brain-power and sweat.

All frameworks do is take care of some of the heavy-lifting, i.e., the mundane foundational research and thinking activity that is not specific to your situation.

Unfortunately the myth of the framework is why many organizations think that choosing TOGAF will immediately solve their IT issues and are then disappointed when this doesn’t happen, when a more sensible approach might have garnered better long-term success.”

On Data Quality: Quo Vadimus?, Richard Jarvis commented:

“I agree with everything you’ve said, but there’s a much uglier truth about data quality that should also be discussed — the business benefit of NOT having a data quality program.

The unfortunate reality is that in a tight market, the last thing many decision makers want to be made public (internally or externally) is the truth.

In a company with data quality principles ingrained in day-to-day processes, and reporting handled independently, it becomes much harder to hide or reinterpret your falling market share.  Without these principles though, you’ll probably be able to pick your version of the truth from a stack of half a dozen, then spend your strategy meeting discussing which one is right instead of what you’re going to do about it.

What we’re talking about here is the difference between a Politician — who will smile at the camera and proudly announce 0.1% growth was a fantastic result given X, Y, and Z factors — and a Statistician who will endeavor to describe reality with minimal personal bias.

And the larger the organization, the more internal politics plays a part.  I believe a lot of the reluctance in investing in data quality initiatives could be traced back to this fear of being held truly accountable, regardless of it being in the best interests of the organization.  To build a data quality-centric culture, the change must be driven from the CEO down if it’s to succeed.”

On Data Quality: Quo Vadimus?, Peter Perera commented:

“The question: ‘Is Data Quality a Journey or a Destination?’ suggests that it is one or the other.

I agree with another comment that data quality is neither . . . or, I suppose, it could be both (the journey is the destination and the destination is the journey. They are one and the same.)

The quality of data (or anything for that matter) is something we experience.

Quality only radiates when someone is in the act of experiencing the data, and usually only when it is someone that matters.  This radiation decays over time, ranging from seconds or less to years or more.

The only problem with viewing data quality as radiation is that radiation can be measured by an instrument, but there is no such instrument to measure data quality.

We tend to confuse data qualities (which can be measured) and data quality (which cannot).

In the words of someone whose name I cannot recall: Quality is not job one. Being totally %@^#&$*% amazing is job one.The only thing I disagree with here is that being amazing is characterized as a job.

Data quality is not something we do to data.  It’s not a business initiative or project or job.  It’s not a discipline.  We need to distinguish between the pursuit (journey) of being amazing and actually being amazing (destination — but certainly not a final one).  To be amazing requires someone to be amazed.  We want data to be continuously amazing . . . to someone that matters, i.e., someone who uses and values the data a whole lot for an end that makes a material difference.

Come to think of it, the only prerequisite for data quality is being alive because that is the only way to experience it.  If you come across some data and have an amazed reaction to it and can make a difference using it, you cannot help but experience great data quality.  So if you are amazing people all the time with your data, then you are doing your data quality job very well.”

On Data Quality and Miracle Exceptions, Gordon Hamilton commented:

“Nicely delineated argument, Jim.  Successfully starting a data quality program seems to be a balance between getting started somewhere and determining where best to start.  The data quality problem is like a two-edged sword without a handle that is inflicting the death of a thousand cuts.

Data quality is indeed difficult to get a handle on.”

And since they generated so much great banter, please check out all of the commendable comments received by the blog posts There is No Such Thing as a Root Cause and You only get a Return from something you actually Invest in.

 

Thank You for Three Awesome Years

You are Awesome — which is why receiving your comments has been the most rewarding aspect of my blogging experience over the last three years.  Even if you have never posted a comment, you are still awesome — feel free to tell everyone I said so.

This entry in the series highlighted commendable comments on blog posts published between December 2011 and March 2012.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please continue commenting and stay tuned for future entries in the series.

Thank you for reading the Obsessive-Compulsive Data Quality blog for the last three years. Your readership is deeply appreciated.

 

Related Posts

Commendable Comments (Part 11)

Commendable Comments (Part 10) – The 300th OCDQ Blog Post

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

OCDQ Blog Bicentennial – The 200th OCDQ Blog Post

Commendable Comments (Part 9)

Commendable Comments (Part 8)

Commendable Comments (Part 7)

Commendable Comments (Part 6)

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

Commendable Comments (Part 4)

Commendable Comments (Part 3)

Commendable Comments (Part 2)

Commendable Comments (Part 1)

Data Quality and Big Data

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

This is Part 2 of 2 from my recent discussion with Tom Redman.  In this episode, Tom and I discuss data quality and big data, including if data quality matters less in larger data sets, if statistical outliers represent business insights or data quality issues, statistical sampling errors versus measurement calibration errors, mistaking signal for noise (i.e., good data for bad data), and whether or not the principles and practices of true “data scientists” will truly be embraced by an organization’s business leaders.

Dr. Thomas C. Redman (the “Data Doc”) is an innovator, advisor, and teacher.  He was first to extend quality principles to data and information in the late 80s.  Since then he has crystallized a body of tools, techniques, roadmaps and organizational insights that help organizations make order-of-magnitude improvements.

More recently Tom has developed keen insights into the nature of data and formulated the first comprehensive approach to “putting data to work.”  Taken together, these enable organizations to treat data as assets of virtually unlimited potential.

Tom has personally helped dozens of leaders and organizations better understand data and data quality and start their data programs.  He is a sought-after lecturer and the author of dozens of papers and four books.  The most recent, Data Driven: Profiting from Your Most Important Business Asset (Harvard Business Press, 2008) was a Library Journal best buy of 2008.

Prior to forming Navesink Consulting Group in 1996, Tom conceived the Data Quality Lab at AT&T Bell Laboratories in 1987 and led it until 1995.  Tom holds a Ph.D. in statistics from Florida State University. He holds two patents.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Data Driven

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

This is Part 1 of 2 from my recent discussion with Tom Redman.  In this episode, Tom and I discuss concepts from one of my favorite data quality books, which is his most recent book: Data Driven: Profiting from Your Most Important Business Asset.

Our discussion includes viewing data as an asset, an organization’s hierarchy of data needs, a simple model for culture change, and attempting to achieve the “single version of the truth” being marketed as a goal of master data management (MDM).

Dr. Thomas C. Redman (the “Data Doc”) is an innovator, advisor, and teacher.  He was first to extend quality principles to data and information in the late 80s.  Since then he has crystallized a body of tools, techniques, roadmaps and organizational insights that help organizations make order-of-magnitude improvements.

More recently Tom has developed keen insights into the nature of data and formulated the first comprehensive approach to “putting data to work.”  Taken together, these enable organizations to treat data as assets of virtually unlimited potential.

Tom has personally helped dozens of leaders and organizations better understand data and data quality and start their data programs.  He is a sought-after lecturer and the author of dozens of papers and four books.

Prior to forming Navesink Consulting Group in 1996, Tom conceived the Data Quality Lab at AT&T Bell Laboratories in 1987 and led it until 1995. Tom holds a Ph.D. in statistics from Florida State University.  He holds two patents.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Data Quality and Miracle Exceptions

“Reading superhero comic books with the benefit of a Ph.D. in physics,” James Kakalios explained in The Physics of Superheroes, “I have found many examples of the correct description and application of physics concepts.  Of course, the use of superpowers themselves involves direct violations of the known laws of physics, requiring a deliberate and willful suspension of disbelief.”

“However, many comics need only a single miracle exception — one extraordinary thing you have to buy into — and the rest that follows as the hero and the villain square off would be consistent with the principles of science.”

“Data Quality is all about . . .”

It is essential to foster a marketplace of ideas about data quality in which a diversity of viewpoints is freely shared without bias, where everyone is invited to get involved in discussions and debates and have an opportunity to hear what others have to offer.

However, one of my biggest pet peeves about the data quality industry is when I listen to analysts, vendors, consultants, and other practitioners discuss data quality challenges, I am often required to make a miracle exception for data quality.  In other words, I am given one extraordinary thing I have to buy into in order to be willing to buy their solution to all of my data quality problems.

These superhero comic book style stories usually open with a miracle exception telling me that “data quality is all about . . .”

Sometimes, the miracle exception is purchasing technology from the right magic quadrant.  Other times, the miracle exception is either following a comprehensive framework, or following the right methodology from the right expert within the right discipline (e.g., data modeling, business process management, information quality management, agile development, data governance, etc.).

But I am especially irritated by individuals who bash vendors for selling allegedly only reactive data cleansing tools, while selling their allegedly only proactive defect prevention methodology, as if we could avoid cleaning up the existing data quality issues, or we could shut down and restart our organizations, so that before another single datum is created or business activity is executed, everyone could learn how to “do things the right way” so that “the data will always be entered right, the first time, every time.”

Although these and other miracle exceptions do correctly describe the application of data quality concepts in isolation, by doing so, they also oversimplify the multifaceted complexity of data quality, requiring a deliberate and willful suspension of disbelief.

Miracle exceptions certainly make for more entertaining stories and more effective sales pitches, but oversimplifying complexity for the purposes of explaining your approach, or, even worse and sadly more common, preaching at people that your approach definitively solves their data quality problems, is nothing less than applying the principle of deus ex machina to data quality.

Data Quality and deus ex machina

Deus ex machina is a plot device whereby a seemingly unsolvable problem is suddenly and abruptly solved with the contrived and unexpected intervention of some new event, character, ability, or object.

This technique is often used in the marketing of data quality software and services, where the problem of poor data quality can seemingly be solved by a new event (e.g., creating a data governance council), a new character (e.g., hiring an expert consultant), a new ability (e.g., aligning data quality metrics with business insight), or a new object (e.g., purchasing a new data quality tool).

Now, don’t get me wrong.  I do believe various technologies and methodologies from numerous disciplines, as well as several core principles (e.g., communication, collaboration, and change management) are all important variables in the data quality equation, but I don’t believe that any particular variable can be taken in isolation and deified as the God Particle of data quality physics.

Data Quality is Not about One Extraordinary Thing

Data quality isn’t all about technology, nor is it all about methodology.  And data quality isn’t all about data cleansing, nor is it all about defect prevention.  Data quality is not about only one thing — no matter how extraordinary any one of its things may seem.

Battling the dark forces of poor data quality doesn’t require any superpowers, but it does require doing the hard daily work of continuously improving your data quality.  Data quality does not have a miracle exception, so please stop believing in one.

And for the love of high-quality data everywhere, please stop trying to sell us one.