Word of Mouth has become Word of Data

In a previous post about overcoming information asymmetry, I discussed one of the ways that customers are changing the balance of power in the retail industry.  During last week’s Mid-Market Smarter Commerce Tweet Chat, the first question was:

Why does measuring social media matter for the retail industry today?

My response was: Word of Mouth has become Word of Data.  In this blog post, I want to explain what I meant by that.

Historically, information reached customers in one of two ways, either through advertising or word of mouth.  The latter was usually words coming from the influential mouths of family and friends, but sometimes from strangers with relevant experience or expertise.  Either way, those words were considered more credible than advertising based on the assumption that the mouths saying them didn’t stand to gain anything personally from sharing their opinions about a company, product, or service.

The biggest challenge facing word of mouth was that you either had to be there to hear the words when they were spoken, or you needed to have a large enough network of people you knew that would be able to pass along those words.  The latter was like we were all playing the children’s game broken telephone since relying upon only verbally transmitted information about any subject, and perhaps especially about a purchasing decision, was dubious when receiving the information via one or more intermediaries.

But the rise of social networking services, like Twitter, Facebook, and Google Plus, has changed the game, especially now that our broken telephones have been replaced with smartphones.  Not only is our social network larger (albeit still mostly comprised of intermediate connections), but, more important, our conversations are essentially being transcribed — our words no longer just leave our mouths, but are also exchanged in short bursts of social data via tweets, status updates, online reviews, and blog posts.

And it could be argued that our social data has a more active social life than we do, since all of our data interacts with the data from other users within and across our social networks, participating in conversations that keep on going long after we have logged out.  Influential tweets get re-tweeted.  Meaningful status updates and blog posts receive comments.  Votes determine which online reviews are most helpful.  This ongoing conversation enriches the information customers have available to them.

Although listening to customers has always been important, gathering customer feedback used to be a challenge.  But nowadays, customers provide their feedback to retailers, and share their experiences with other customers, via social media.  Word of mouth has become word of data.  The digital mouths of customers speak volumes.  The voice of the customer has become empowered by social media, changing the balance of power in the retail industry, and putting customers in control of the conversation.

 

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.

 

More Tethered by the Untethered Enterprise?

This blog post is sponsored by the Enterprise CIO Forum and HP.

A new term I have been hearing more frequently lately is the Untethered Enterprise.  Like many new terms, definitions vary, but for me at least, it conjures up images of cutting the cords and wires that tether the enterprise to a specific physical location, and tether the business activities of its employees to specific time frames during specific days of the week.

There was a time, not too long ago, when the hard-wired phone lines for desk phones and the Ethernet cables for desktop PCs used by employees between the hours of 9AM and 5PM on Monday through Friday within the office spaces of the organization was how, when, and where the vast majority of the business activities of the enterprise were conducted.

Then came the first generation of mobile phones — the ones that only made phone calls.  And laptop computers, which initially supplemented desktop PCs, but typically only for those employees with a job requiring them to regularly work outside the office, such as traveling salespeople.  Eventually, laptops became the primary work computer with docking stations allowing them to connect to keyboards and monitors while working in the office, and providing most employees with the option of taking their work home with them.  Then the next generations of mobile phones brought text messaging, e-mail, and as Wi-Fi networks became more prevalent, full Internet access, which completed the education of the mobile phone, graduating it to a smartphone.

These smartphones are now supplemented by either a laptop or a tablet, or sometimes both.  These devices are either provided by the enterprise, or with the burgeoning Bring Your Own Device (BYOD) movement, employees are allowed to use their personal smartphones, laptops, and tablets for business purposes.  Either way, enabled by the growing availability of cloud-based services, many employees of most organizations are now capable of conducting business anywhere at anytime.  And beyond a capability, some enterprises foster the expectation that their employees demonstrate a willingness to conduct business anywhere at anytime.

I acknowledge its potential for increasing productivity and better supporting the demands of today’s fast-paced business world, but I can’t help but wonder if the enterprise and its employees will feel more tethered by the untethered enterprise because, when we can no longer unplug since there’s nothing left to unplug, then our always precarious work-life balance seems to surrender to the pervasive work-is-life feeling enabled by the untethered enterprise.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

The Diffusion of the Consumerization of IT

Serving IT with a Side of Hash Browns

The IT Consumerization Conundrum

The IT Prime Directive of Business First Contact

The UX Factor

A Swift Kick in the AAS

Shadow IT and the New Prometheus

The Diderot Effect of New Technology

Are Cloud Providers the Bounty Hunters of IT?

The IT Pendulum and the Federated Future of IT

Information Asymmetry versus Empowered Customers

Information asymmetry is a term from economics describing how one party involved in a transaction typically has more or better information than the other party.  Perhaps the easiest example of information asymmetry is retail sales, where historically the retailer has always had more or better information than the customer about a product that is about to be purchased.

Generally speaking, information asymmetry is advantageous for the retailer, allowing them to manipulate the customer into purchasing products that benefit the retailer’s goals (e.g., maximizing profit margins or unloading excess inventory) more than the customer’s goals (e.g., paying a fair price or buying the product that best suits their needs).  I don’t mean to demonize the retail industry, but for a long time, I’m pretty sure its unofficial motto was: “An uninformed customer is the best customer.”

Let’s consider the example of purchasing a high-definition television (HDTV) since it demonstrates how information asymmetry is not always about holding back useful information, but also bombarding customers with useless information.  In this example, it’s about bombarding customers with useless technical jargon, such as refresh rate, resolution, and contrast ratio.

To an uninformed customer, it certainly sounds like it makes sense that the HDTV with a 240Hz refresh rate, 1080p resolution, and 2,000,000:1 contrast ratio is better than the one with a 120Hz refresh rate, 720p resolution, and 1,000,000:1 contrast ratio.

After all, 240 > 120, 1080 > 720, and 2,000,000 > 1,000,000, right?  Yes — but what do any of those numbers actually mean?

The reality is that refresh rate, resolution, and contrast ratio are just three examples of useless HDTV specifications because they essentially provide no meaningful information about the video quality of the television.  This information is advantageous to only one party involved in the transaction — the retailer — since it appears to justify the higher price of an allegedly better product.

But nowadays fewer customers are falling for these tricks.  Performing a quick Internet search, either before going shopping or on their mobile phone while at the store, is balancing out some of the information asymmetry in retail sales and empowering customers to make better purchasing decisions.  With the increasing availability of broadband Internet and mobile connectivity, today’s empowered customer arrives at the retail front lines armed and ready to do battle with information asymmetry.

 

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.

 

The Data Quality Placebo

Inspired by a recent Boing Boing blog post

Are you suffering from persistent and annoying data quality issues?  Or are you suffering from the persistence of data quality tool vendors and consultants annoying you with sales pitches about how you must be suffering from persistent data quality issues?

Either way, the Data Division of Prescott Pharmaceuticals (trusted makers of gastroflux, datamine, selectium, and qualitol) is proud to present the perfect solution to all of your real and/or imaginary data quality issues — The Data Quality Placebo.

Simply take two capsules (made with an easy-to-swallow coating) every morning and you will be guaranteed to experience:

“Zero Defects with Zero Side Effects” TM

(Legal Disclaimer: Zero Defects with Zero Side Effects may be the result of Zero Testing, which itself is probably just a side effect of The Prescott Promise: “We can promise you that we will never test any of our products on animals because . . . we never test any of our products.”)

How Data Cleansing Saves Lives

When it comes to data quality best practices, it’s often argued, and sometimes quite vehemently, that proactive defect prevention is far superior to reactive data cleansing.  Advocates of defect prevention sometimes admit that data cleansing is a necessary evil.  However, at least in my experience, most of the time they conveniently, and ironically, cleanse (i.e., drop) the word necessary.

Therefore, I thought I would share a story about how data cleansing saves lives, which I read about in the highly recommended book Space Chronicles: Facing the Ultimate Frontier by Neil deGrasse Tyson.  “Soon after the Hubble Space Telescope was launched in April 1990, NASA engineers realized that the telescope’s primary mirror—which gathers and reflects the light from celestial objects into its cameras and spectrographs—had been ground to an incorrect shape.  In other words, the two-billion dollar telescope was producing fuzzy images.  That was bad.  As if to make lemonade out of lemons, though, computer algorithms came to the rescue.  Investigators at the Space Telescope Science Institute in Baltimore, Maryland, developed a range of clever and innovative image-processing techniques to compensate for some of Hubble’s shortcomings.”

In other words, since it would be three years before Hubble’s faulty optics could be repaired during a 1993 space shuttle mission, data cleansing allowed astrophysicists to make good use of Hubble despite the bad data quality of its early images.

So, data cleansing algorithms saved Hubble’s fuzzy images — but how did this data cleansing actually save lives?

“Turns out,” Tyson explained, “maximizing the amount of information that could be extracted from a blurry astronomical image is technically identical to maximizing the amount of information that can be extracted from a mammogram.  Soon the new techniques came into common use for detecting early signs of breast cancer.”

“But that’s only part of the story.  In 1997, for Hubble’s second servicing mission, shuttle astronauts swapped in a brand-new, high-resolution digital detector—designed to the demanding specifications of astrophysicists whose careers are based on being able to see small, dim things in the cosmos.  That technology is now incorporated in a minimally invasive, low-cost system for doing breast biopsies, the next stage after mammograms in the early diagnosis of cancer.”

Even though defect prevention was eventually implemented to prevent data quality issues in Hubble’s images of outer space, those interim data cleansing algorithms are still being used today to help save countless human lives here on Earth.

So, at least in this particular instance, we have to admit that data cleansing is a necessary good.

The Diffusion of the Consumerization of IT

This blog post is sponsored by the Enterprise CIO Forum and HP.

On a previous post about the consumerization of IT, Paul Calento commented: “Clearly, it’s time to move IT out of a discrete, defined department and out into the field, even more than already.  Likewise, solutions used to power an organization need to do the same thing.  Problem is, though, that it’s easy to say that embedding IT makes sense (it does), but there’s little experience with managing it (like reporting and measurement).  Services integration is a goal, but cross-department, cross-business-unit integration remains a thorn in the side of many attempts.”

Embedding IT does make sense, and not only is it easier said than done, let alone done well, but part of the problem within many organizations is that IT became partially self-embedded within some business units while the IT department was resisting the consumerization of IT because they treated it like a fad and not an innovation.  And now those business units are resisting the efforts of the redefined IT department because they fear losing the IT capabilities that consumerization has already given them.

This growing IT challenge brings to mind the Diffusion of Innovations theory developed by Everett Rogers for describing the five stages for the rate at which innovations (e.g., new ideas or technology trends) spread within cultures, such as organizations, starting with the Innovators and Early Adopters, progressing through the Early and Late Majority, and trailed by the Laggards.

A related concept called Crossing the Chasm was developed by Geoffrey Moore to describe the critical phenomenon occurring when enough of the Early Adopters have embraced the innovation so that the beginning of the Early Majority becomes an almost certainty even though mainstream adoption of the innovation is still far from guaranteed.

From my perspective, traditional IT departments are just now crossing the chasm of the diffusion of the consumerization of IT, and are conflicting with the business units that crossed the chasm long ago with their direct adoption of cloud computingSaaS, and mobility solutions not provided by the IT department.  This divergence caused by the IT department and some business units being on different sides of the chasm has damaged, and potentially irreparably, some aspects of the IT-Business partnership.

The longer the duration of this divergence, the more difficult it will be for an IT department, that has finally crossed the chasm, to redefine their role and remain relevant partners with those business units that, perhaps for the first time in the organization’s history, were ahead of the information technology adoption curve.  Additionally, even the communication and collaboration across business units is negatively affected by different business units crossing the IT consumerization chasm at different times, which often, as Paul Calento noted, complicates the organization’s attempts to integrate cross-business-unit IT services.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

Serving IT with a Side of Hash Browns

The IT Consumerization Conundrum

The IT Prime Directive of Business First Contact

The UX Factor

A Swift Kick in the AAS

Shadow IT and the New Prometheus

The Diderot Effect of New Technology

Are Cloud Providers the Bounty Hunters of IT?

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

Data Quality and the Q Test

In psychology, there’s something known as the Q Test, which asks you to use one of your fingers to trace an upper case letter Q on your forehead.  Before reading this blog post any further, please stop and perform the Q Test on your forehead right now.

 

Essentially, there’s only two ways you can complete the Q Test, which are differentiated by how you trace the tail of the Q.  Most people start by tracing a letter O, and then complete the Q by tracing its tail either toward their right eye or toward their left eye.

If you trace the tail of the Q toward your right eye, you’re imagining what a letter Q would look like from your perspective.  But if you trace the tail of the Q toward your left eye, you’re imagining what it would look like from the perspective of another person.

Basically, the point of the Q Test is to determine whether or not you have a natural tendency to consider the perspective of others.

Although considering the perspective of others is a positive under different circumstances, if you traced the letter Q with its tail toward your left eye, psychologists say that you failed the Q Test since it reveals a negative — you’re a good liar.  The reason why is that you have to be good at considering the perspective of others in order to be good at deceiving them with a believable lie.

So, as I now consider your perspective, dear reader, I bet you’re wondering: What does the Q Test have to do with data quality?

Like truth, beauty, and art, data quality can be said to be in the eyes of the beholder, or when data quality is defined, as it most often is, as fitness for the purpose of use — the eyes of the user.  But since most data has both multiple uses and users, data fit for the purpose of one use or user may not be fit for the purpose of other uses and users.  However, these multiple perspectives are considered irrelevant from the perspective of an individual user, who just needs quality data fit for the purpose of their own use.

The good news is that when it comes to data quality, most of us pass the Q Test, which means we’re not good liars.  The bad news is that since most of us pass the Q Test, we’re often only concerned about our own perspective about data quality, which is why so many organizations struggle to define data quality standards.

At the next discussion about your organization’s data quality standards, try inviting the participants to perform the Q Test.

 

Related Posts

The Point of View Paradox

You Say Potato and I Say Tater Tot

Data Myopia and Business Relativity

Beyond a “Single Version of the Truth”

DQ-BE: Single Version of the Time

Data and the Liar’s Paradox

The Fourth Law of Data Quality

Plato’s Data

Once Upon a Time in the Data

The Idea of Order in Data

Hell is other people’s data

Song of My Data

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Redefining Data Quality — Guest Peter Perera discusses his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.
  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Two Flaws in the “Fail Faster” Philosophy

There are many who advocate that the key to success, especially with innovation, is what’s known as the “fail faster” philosophy, which says that not only should we embrace new ideas and try new things without being overly concerned with failure, but, more importantly, we should effectively fail as efficiently as possible in order to expedite learning valuable lessons from our failure.

However, I have often experienced what I see as two fundamental flaws in the “fail faster” philosophy:

  1. It requires that you define failure
  2. It requires that you admit when you have failed

Most people — myself included — often fail both of these requirements.  Most people do not define failure, but instead assume that they will be successful (even though they conveniently do not define success either).  But even when people define failure, they often refuse to admit when they have failed.  In the face of failure, most people either redefine failure or extend the deadline (perhaps we should call it the fail line?) for when they will have to admit that they have failed.

We are often regaled with stories of persistence in spite of repeated failure, such as Thomas Edison’s famous remark:

“Many of life’s failures are people who did not realize how close they were to success when they gave up.”

Edison also remarked that he didn’t invent one way to make a lightbulb, but instead he invented more than 1,000 ways how not to make a lightbulb.  Each of those failed prototypes for a commercially viable lightbulb was instructive and absolutely essential to his eventual success.  But what if Edison had refused to define and admit failure?  How would he have known when to abandon one prototype and try another?  How would he have been able to learn valuable lessons from his repeated failure?

Josh Linkner recently blogged about failure being the dirty little secret of so-called overnight success, citing several examples, including Rovio (makers of the Angry Birds video game), Dyson vacuum cleaners, and WD-40.

Although these are definitely inspiring success stories, my concern is that often the only failure stories we hear are about people and companies that became famous for eventually succeeding.  In other words, we often hear eventually successful stories, and we almost never hear, or simply choose to ignore, the more common, and perhaps more useful, cautionary tales of abject failure.

It seems we have become so obsessed with telling stories that we have relegated both failure and success to the genre of fiction, which I fear is preventing us from learning any fact-based, and therefore truly valuable, lessons about failure and success.

 

Related Posts

The Winning Curve

Persistence

Mistake Driven Learning

The Fragility of Knowledge

The Wisdom of Failure

Talking Business about the Weather

Businesses of all sizes are always looking for ways to increase revenue, decrease costs, and operate more efficiently.  When I talk with midsize business owners, I hear the typical questions.  Should we hire a developer to update our website and improve our SEO rankings?  Should we invest less money in traditional advertising and invest more time in social media?  After discussing these and other business topics for a while, we drift into that standard conversational filler — talking about the weather.

But since I am always interested in analyzing data from as many different perspectives as possible, when I talk about the weather, I ask midsize business owners how much of a variable the weather plays in their business.  Does the weather affect the number of customers that visit your business on a daily basis?  Do customers purchase different items when the weather is good versus bad?

I usually receive quick responses, but when I ask if those responses were based on analyzing sales data alongside weather data, the answer is usually no, which is understandable since businesses are successful when they can focus on their core competencies, and for most businesses, analytics is not a core competency.  The demands of daily operations often prevent midsize businesses from stepping back and looking at things differently, like whether or not there’s a hidden connection between weather and sales.

One of my favorite books is Freakonomics: A Rogue Economist Explores the Hidden Side of Everything by Steven Levitt and Stephen Dubner.  The book, as well as its sequel, podcast, and movie, provides good examples of one of the common challenges facing data science, and more specifically predictive analytics since its predictions often seem counterintuitive to business leaders, whose intuition is rightfully based on their business expertise, which has guided their business success to date.  The reality is that even organizations that pride themselves on being data driven naturally resist any counterintuitive insights found in their data.

Dubner was recently interviewed by Crysta Anderson about how organizations can find insights in their data if they are willing and able to ask good questions.  Of course, it’s not always easy to determine what a good question would be.  But sometimes something as simple as talking about the weather when you’re talking business could lead to a meaningful business insight.

 

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.

 

Solvency II and Data Quality

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, Ken O’Connor and I discuss the Solvency II standards for data quality, and how its European insurance regulatory requirement of “complete, appropriate, and accurate” data represents common sense standards for all businesses.

Ken O’Connor is an independent data consultant with over 30 years of hands-on experience in the field, specializing in helping organizations meet the data quality management challenges presented by data-intensive programs such as data conversions, data migrations, data population, and regulatory compliance such as Solvency II, Basel II / III, Anti-Money Laundering, the Foreign Account Tax Compliance Act (FATCA), and the Dodd–Frank Wall Street Reform and Consumer Protection Act.

Ken O’Connor also provides practical data quality and data governance advice on his popular blog at: kenoconnordata.com

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Pitching Perfect Data Quality

In my previous post, I used a baseball metaphor to explain why we should strive for a quality start to our business activities by starting them off with good data quality, thereby giving our organization a better chance to succeed.

Since it’s a beautiful week for baseball metaphors, let’s post two!  (My apologies to Ernie Banks.)

If good data quality gives our organization a better chance to succeed, then it seems logical to assume that perfect data quality would give our organization the best chance to succeed.  However, as Yogi Berra said: “If the world were perfect, it wouldn’t be.”

My previous baseball metaphor was based on a statistic that measured how well a starting pitcher performs during a game.  The best possible performance of a starting pitcher is called a perfect game, when nine innings are perfectly completed by retiring the minimum of 27 opposing batters without allowing any hits, walks, hit batsmen, or batters reaching base due to a fielding error.

Although a lot of buzz is generated when a pitcher gets close to pitching a perfect game (e.g., usually after five perfect innings, it’s all the game’s announcers will talk about), during the 143 years of Major League Baseball history, during which approximately 200,000 games have been played, there have been only 20 perfect games, making it one of the rarest statistical events in baseball.

When a pitcher loses the chance of pitching a perfect game, does his team forfeit the game?  No, of course not.  Because the pitcher’s goal is not pitching perfectly.  The pitcher’s (and every other player’s) goal is helping the team win the game.

This is why I have never been a fan of anyone who is pitching perfect data quality, i.e., anyone advocating data perfection as the organization’s goal.  The organization’s goal is business success.  Data quality has a role to play, but claiming business success is impossible without having perfect data quality is like claiming winning in baseball is impossible without pitching a perfect game.

 

Related Posts

DQ-View: Baseball and Data Quality

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

Data Quality and The Middle Way

There is No Such Thing as a Root Cause

OCDQ Radio - The Johari Window of Data Quality

Data Quality and Miracle Exceptions

Data Quality: Quo Vadimus?

Quality Starts and Data Quality

This past week was the beginning of the 2012 Major League Baseball (MLB) season.  Since its data is mostly transaction data describing the statistical events of games played, baseball has long been a sport obsessed with statistics.  Baseball statisticians slice and dice every aspect of past games attempting to discover trends that could predict what is likely to happen in future games.

There are too many variables involved in determining which team will win a particular game to be able to choose a single variable that predicts game results.  But a few key statistics are cited by baseball analysts as general guidelines of a team’s potential to win.

One such statistic is a quality start, which is defined as a game in which a team’s starting pitcher completes at least six innings and permits no more than three earned runs.  Of course, a so-called quality start is no guarantee that the starting pitcher’s team will win the game.  But the relative reliability of the statistic to predict a game’s result causes some baseball analysts to refer to a loss suffered by a pitcher in a quality start as a tough loss and a win earned by a pitcher in a non-quality start as a cheap win.

There are too many variables involved in determining if a particular business activity will succeed to be able to choose a single variable that predicts business results.  But data quality is one of the general guidelines of an organization’s potential to succeed.

As Henrik Liliendahl Sørensen blogged, organizations are capable of achieving success with their business activities despite bad data quality, which we could call the business equivalent of cheap wins.  And organizations are also capable of suffering failure with their business activities despite good data quality, which we could call the business equivalent of tough losses.

So just like a quality start is no guarantee of a win in baseball, good data quality is no guarantee of a success in business.

But perhaps the relative reliability of data quality to predict business results should influence us to at least strive for a quality start to our business activities by starting them off with good data quality, thereby giving our organization a better chance to succeed.

 

Related Posts

DQ-View: Baseball and Data Quality

Poor Quality Data Sucks

Fantasy League Data Quality

There is No Such Thing as a Root Cause

Data Quality: Quo Vadimus?

OCDQ Radio - The Johari Window of Data Quality

OCDQ Radio - Redefining Data Quality

OCDQ Radio - The Blue Box of Information Quality

OCDQ Radio - Studying Data Quality

OCDQ Radio - Organizing for Data Quality

Will Big Data be Blinded by Data Science?

All of the hype about Big Data is also causing quite the hullabaloo about hiring Data Scientists in order to help your organization derive business value from big data analytics.  But even though we are still in the hype and hullabaloo stages, these unrelenting trends are starting to rightfully draw the attention of businesses of all sizes.  After all, the key word in big data isn’t big, because, in our increasing data-constructed world, big data is no longer just for big companies and high-tech firms.

And since the key word in data scientist isn’t data, in this post I want to focus on the second word in today’s hottest job title.

When I think of a scientist of any kind, I immediately think of the scientific method, which has been the standard operating procedure of scientific discovery since the 17th century.  First, you define a question, gather some initial data, and form a hypothesis, which is some idea about how to answer your question.  Next, you perform an experiment to test the hypothesis, during which more data is collected.  Then, you analyze the experimental data and evaluate your results.  Whether or not the experiment confirmed or contradicted your hypothesis, you do the same thing — repeat the experiment.  Because a hypothesis can only be promoted to a theory after repeated experimentation (including by others) consistently produces the same result.

During experimentation, failure happens just as often as, if not more often than, success.  However, both failure and success have long played an important role in scientific discovery because progress in either direction is still progress.

Therefore, experimentation is an essential component of scientific discovery — and data science is certainly no exception.

“Designed experiments,” Melinda Thielbar recently blogged, “is where we’ll make our next big leap for data science.”  I agree, but with the notable exception of A/B testing in marketing, most business activities generally don’t embrace data experimentation.

“The purpose of science,” Tom Redman recently explained, “is to discover fundamental truths about the universe.  But we don’t run our businesses to discover fundamental truths.  We run our businesses to serve a customer, gain marketplace advantage, or make money.”  In other words, the commercial application of science has more to do with commerce than it does with science.

One example of the challenges inherent in the commercial application of science is the misconception that predictive analytics can predict what is going to happen with certainty.  When instead, what it actually does is predict some of the possible things that could happen with a certain probability.  Although predictive analytics can be a valuable tool for many business activities, especially decision making, as Steve Miller recently blogged, most of us are not good at using probabilities to make decisions.

So, with apologies to Thomas Dolby, I can’t help but wonder, will big data be blinded by data science?  Will the business leaders being told to hire data scientists to derive business value from big data analytics be blind to what data science tries to show them?

 

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.

 

The Data Governance Imperative

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, Steve Sarsfield and I discuss how data governance is about changing the hearts and minds of your company to see the value of data quality, the characteristics of a data champion, and creating effective data quality scorecards.

Steve Sarsfield is a leading author and expert in data quality and data governance.  His book The Data Governance Imperative is a comprehensive exploration of data governance focusing on the business perspectives that are important to data champions, front-office employees, and executives.  He runs the Data Governance and Data Quality Insider, which is an award-winning and world-recognized blog.  Steve Sarsfield is the Product Marketing Manager for Data Governance and Data Quality at Talend.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

What is Weighing Down your Data?

On July 21, 1969, Neil Armstrong spoke the instantly famous words “that’s one small step for man, one giant leap for mankind” as he stepped off the ladder of the Apollo Lunar Module and became the first human being to walk on the surface of the Moon.

In addition to its many other, and more significant, scientific milestones, the Moon landing provided an excellent demonstration of three related, and often misunderstood, scientific concepts: mass, weight, and gravity.

Mass is an intrinsic property of matter, based on the atomic composition of a given object, such as your body for example, which means your mass would therefore remain the same regardless of whether you were walking on the surface of the Moon or Earth.

Weight is not an intrinsic property of matter, but is instead a gravitational force acting on matter.  Because the gravitational force of the Moon is less than the gravitational force of the Earth, you would weigh less on the Moon than you weigh on the Earth.  So, just like Neil Armstrong, your one small step on the surface of the Moon could quite literally become a giant leap.

Using these concepts metaphorically, mass is an intrinsic property of data, and perhaps a way to represent objective data quality, whereas weight is a gravitational force acting on data, and perhaps a way to represent subjective data quality.

Since most data can not escape the gravity of its application, most of what we refer to as data silos are actually application silos because data and applications become tightly coupled due to the strong gravitational force that an application exerts on its data.

Now, of course, an application can exert a strong gravitational force for a strong business reason (e.g., protecting sensitive data), and not, as we often assume by default, for a weak business reason (e.g., protecting corporate political power).

Although you probably don’t view your applications as something that is weighing down your data, and you probably also resist the feeling of weightlessness that can be caused by openly sharing your data, it’s worth considering that whether or not your data truly enables your organization to take giant leaps, not just small steps, depends on the gravitational forces acting on your data.

What is weighing down your data could also be weighing down your organization.

 

Related Posts

Data Myopia and Business Relativity

Are Applications the La Brea Tar Pits for Data?

Hell is other people’s data

My Own Private Data

No Datum is an Island of Serendip

Turning Data Silos into Glass Houses

Sharing Data

The Data Outhouse

The Good Data

Beyond a “Single Version of the Truth”