Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments

Entries in Humor (73)

Tuesday
Apr232013

The Laugh-In Effect of Big Data

Although I am an advocate for data science and big data done right, lately I have been sounding the Anti-Hype Horn with blog posts offering a contrarian’s view of unstructured data, forewarning you about the flying monkeys of big data, cautioning you against performing Cargo Cult Data Science, and inviting you to ponder the perils of the Infinite Inbox.

The hype of big data has resulted in a lot of people and vendors extolling its virtues with stories about how Internet companies, political campaigns, and new technologies have profited, or otherwise benefited, from big data.  These stories are served up as alleged business cases for investing in big data and data science.  Although some of these stories are fluff pieces, many of them accurately, and in some cases comprehensively, describe a real-world application of big data and data science.  However, these messages most often lack a critically important component — applicability to your specific business.  In Made to Stick: Why Some Ideas Survive and Others Die, Chip Heath and Dan Heath explained that “an accurate but useless idea is still useless.  If a message can’t be used to make predictions or decisions, it is without value, no matter how accurate or comprehensive it is.”

Rowan & Martin’s Laugh-In was an American sketch comedy television series, which aired from 1968 to 1973.  One of the recurring characters portrayed by Arte Johnson was Wolfgang the German soldier, who would often comment on the previous comedy sketch by saying (in a heavy and long-drawn-out German accent): “Very interesting . . . but stupid!”

From now on whenever someone shares another interesting story masquerading as a solid business case for big data that lacks any applicability beyond the specific scenario in the story, a common phenomenon I call The Laugh-In Effect of Big Data, my unapologetic response will resoundingly be: “Very interesting . . . but stupid!”

 

Related Posts

On Philosophy, Science, and Data

OCDQ Radio - Demystifying Data Science

OCDQ Radio - Data Quality and Big Data

Big Data and the Infinite Inbox

HoardaBytes and the Big Data Lebowski

Will Big Data be Blinded by Data Science?

Data Silence

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Big Data el Memorioso

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

Dot Collectors and Dot Connectors

The Wisdom of Crowds, Friends, and Experts

A Contrarian’s View of Unstructured Data

The Flying Monkeys of Big Data

Cargo Cult Data Science

A Statistically Significant Resolution for 2013

Speed Up Your Data to Slow Down Your Decisions

Rage against the Machines Learning

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

The Big Data Theory

Swimming in Big Data

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Tuesday
Feb052013

Big Data and the Infinite Inbox

Occasionally it’s necessary to temper the unchecked enthusiasm accompanying the peak of inflated expectations associated with any hype cycle.  This may be especially true for big data, and especially now since, as Svetlana Sicular of Gartner recently blogged, big data is falling into the trough of disillusionment and “to minimize the depth of the fall, companies must be at a high enough level of analytical and enterprise information management maturity combined with organizational support of innovation.”

I fear the fall may feel bottomless for those who fell hard for the hype and believe the Big Data Psychic capable of making better, if not clairvoyant, predictions.  When, in fact, “our predictions may be more prone to failure in the era of big data,” explained Nate Silver in his book The Signal and the Noise: Why Most Predictions Fail but Some Don't.  “There isn’t any more truth in the world than there was before the Internet.  Most of the data is just noise, as most of the universe is filled with empty space.”

Proposing the 3Ss (Small, Slow, Sure) as a counterpoint to the 3Vs (Volume, Velocity, Variety), Stephen Few recently blogged about the slow data movement.  “Data is growing in volume, as it always has, but only a small amount of it is useful.  Data is being generated and transmitted at an increasing velocity, but the race is not necessarily for the swift; slow and steady will win the information race.  Data is branching out in ever-greater variety, but only a few of these new choices are sure.”

Big data requires us to revisit information overload, a term that was originally about, not the increasing amount of information, but instead the increasing access to information.  As Clay Shirky stated, “It’s not information overload, it’s filter failure.”

As Silver noted, the Internet (like the printing press before it) was a watershed moment in our increased access to information, but its data deluge didn’t increase the amount of truth in the world.  And in today’s world, where many of us strive on a daily basis to prevent email filter failure and achieve what Merlin Mann called Inbox Zero, I find unfiltered enthusiasm about big data to be rather ironic, since big data is essentially enabling the data-driven decision making equivalent of the Infinite Inbox.

Imagine logging into your email every morning and discovering: You currently have () Unread Messages.

However, I’m sure most of it probably would be spam, which you obviously wouldn’t have any trouble quickly filtering (after all, infinity minus spam must be a back of the napkin calculation), allowing you to only read the truly useful messages.  Right?

 

Related Posts

HoardaBytes and the Big Data Lebowski

OCDQ Radio - Data Quality and Big Data

Open MIKE Podcast — Episode 05: Defining Big Data

Will Big Data be Blinded by Data Science?

Data Silence

Magic Elephants, Data Psychics, and Invisible Gorillas

The Graystone Effects of Big Data

Information Overload Revisited

Exercise Better Data Management

A Tale of Two Datas

A Statistically Significant Resolution for 2013

It’s Not about being Data-Driven

Big Data, Sporks, and Decision Frames

Big Data: Structure and Quality

Darth Vader, Big Data, and Predictive Analytics

Big Data, Predictive Analytics, and the Ideal Chronicler

The Big Data Theory

Swimming in Big Data

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Tuesday
Jan222013

Popeye, Spinach, and Data Quality

As a kid, one of my favorite cartoons was Popeye the Sailor, who was empowered by eating spinach to take on many daunting challenges, such as battling his brawny nemesis Bluto for the affections of his love interest Olive Oyl, often kidnapped by Bluto.

I am reading the book The Half-life of Facts: Why Everything We Know Has an Expiration Date by Samuel Arbesman, who explained, while examining how a novel fact, even a wrong one, spreads and persists, that one of the strangest examples of the spread of an error is related to Popeye the Sailor.  “Popeye, with his odd accent and improbable forearms, used spinach to great effect, a sort of anti-Kryptonite.  It gave him his strength, and perhaps his distinctive speaking style.  But why did Popeye eat so much spinach?  What was the reason for his obsession with such a strange food?”

The truth begins over fifty years before the comic strip made its debut.  “Back in 1870,” Arbesman explained, “Erich von Wolf, a German chemist, examined the amount of iron within spinach, among many other green vegetables.  In recording his findings, von Wolf accidentally misplaced a decimal point when transcribing data from his notebook, changing the iron content in spinach by an order of magnitude.  While there are actually only 3.5 milligrams of iron in a 100-gram serving of spinach, the accepted fact became 35 milligrams.  Once this incorrect number was printed, spinach’s nutritional value became legendary.  So when Popeye was created, studio executives recommended he eat spinach for his strength, due to its vaunted health properties, and apparently Popeye helped increase American consumption of spinach by a third!”

“This error was eventually corrected in 1937,” Arbesman continued, “when someone rechecked the numbers.  But the damage had been done.  It spread and spread, and only recently has gone by the wayside, no doubt helped by Popeye’s relative obscurity today.  But the error was so widespread, that the British Medical Journal published an article discussing this spinach incident in 1981, trying its best to finally debunk the issue.”

“Ultimately, the reason these errors spread,” Arbesman concluded, “is because it’s a lot easier to spread the first thing you find, or the fact that sounds correct, than to delve deeply into the literature in search of the correct fact.”

What “spinach” has your organization been falsely consuming because of a data quality issue that was not immediately obvious, and which may have led to a long, and perhaps ongoing, history of data-driven decisions based on poor quality data?

Popeye said “I yam what I yam!”  Your organization yams what your data yams, so you had better make damn sure it’s correct.

 

Related Posts

The Family Circus and Data Quality

Can Data Quality avoid the Dustbin of History?

Retroactive Data Quality

Spartan Data Quality

Pirates of the Computer: The Curse of the Poor Data Quality

The Tooth Fairy of Data Quality

The Dumb and Dumber Guide to Data Quality

Darth Data

Occurred, a data defect has . . .

The Data Quality Placebo

Data Quality is People!

DQ-View: The Five Stages of Data Quality

DQ-BE: Data Quality Airlines

Wednesday Word: Quality-ish

The Five Worst Elevator Pitches for Data Quality

Shining a Social Light on Data Quality

The Poor Data Quality Jar

Data Quality and #FollowFriday the 13th

Dilbert, Data Quality, Rabbits, and #FollowFriday

Data Love Song Mashup

Wednesday
Aug012012

Exercise Better Data Management

Recently on Twitter, Daragh O Brien and I discussed his proposed concept.  “After Big Data,” Daragh tweeted, “we will inevitably begin to see the rise of MOData as organizations seek to grab larger chunks of data and digest it.  What is MOData?  It’s MO’Data, as in MOre Data. Or Morbidly Obese Data.  Only good data quality and data governance will determine which.”

Daragh asked if MO’Data will be the Big Data Killer.  I said only if MO’Data doesn’t include MO’BusinessInsight, MO’DataQuality, and MO’DataPrivacy (i.e., more business insight, more data quality, and more data privacy).

“But MO’Data is about more than just More Data,” Daragh replied.  “It’s about avoiding Morbidly Obese Data that clogs data insight and data quality, etc.”

I responded that More Data becomes Morbidly Obese Data only if we don’t exercise better data management practices.

Agreeing with that point, Daragh replied, “Bring on MOData and the Pilates of Data Quality and Data Governance.”

To slightly paraphrase lines from one of my favorite movies — Airplane! — the Cloud is getting thicker and the Data is getting laaaaarrrrrger.  Surely I know well that growing data volumes is a serious issue — but don’t call me Shirley.

Whether you choose to measure it in terabytes, petabytes, exabytes, HoardaBytes, or how much reality bites, the truth is we were consuming way more than our recommended daily allowance of data long before the data management industry took a tip from McDonald’s and put the word “big” in front of its signature sandwich.  (Oh great . . . now I’m actually hungry for a Big Mac.)

But nowadays with silos replicating data, as well as new data, and new types of data, being created and stored on a daily basis, our data is resembling the size of Bob Parr in retirement, making it seem like not even Mr. Incredible in his prime possessed the super strength needed to manage all of our data.  Those were references to the movie The Incredibles, where Mr. Incredible was a superhero who, after retiring into civilian life under the alias of Bob Parr, elicits the observation from this superhero costume tailor: “My God, you’ve gotten fat.”  Yes, I admit not even Helen Parr (aka Elastigirl) could stretch that far for a big data joke.

 

A Healthier Approach to Big Data

Although Daragh’s concerns about morbidly obese data are valid, no superpowers (or other miracle exceptions) are needed to manage all of our data.  In fact, it’s precisely when we are so busy trying to manage all of our data that we hoard countless bytes of data without evaluating data usage, gathering data requirements, or planning for data archival.  It’s like we are trying to lose weight by eating more and exercising less, i.e., consuming more data and exercising less data quality and data governance.  As Daragh said, only good data quality and data governance will determine whether we get more data or morbidly obese data.

Losing weight requires a healthy approach to both diet and exercise.  A healthy approach to diet includes carefully choosing the food you consume and carefully controlling your portion size.  A healthy approach to exercise includes a commitment to exercise on a regular basis at a sufficient intensity level without going overboard by spending several hours a day, every day, at the gym.

Swimming is a great form of exercise, but swimming in big data without having a clear business objective before you jump into the pool is like telling your boss that you didn’t get any work done because you decided to spend all day working out at the gym.

Carefully choosing the data you consume and carefully controlling your data portion size is becoming increasingly important since big data is forcing us to revisit information overload.  However, the main reason that traditional data management practices often become overwhelmed by big data is because traditional data management practices are not always the right approach.

We need to acknowledge that some big data use cases differ considerably from traditional ones.  Data modeling is still important and data quality still matters, but how much data modeling and data quality is needed before big data can be effectively used for business purposes will vary.  In order to move the big data discussion forward, we have to stop fiercely defending our traditional perspectives about structure and quality.  We also have to stop fiercely defending our traditional perspectives about analytics, since there will be some big data use cases where depth and detailed analysis may not be necessary to provide business insight.

 

Better than Big or More

Jim Ericson explained that your data is big enough.  Rich Murnane explained that bigger isn’t better, better is better.  Although big data may indeed be followed by more data that doesn’t necessarily mean we require more data management in order to prevent more data from becoming morbidly obese data.  I think that we just need to exercise better data management.

 

Related Posts

OCDQ Radio - Saving Private Data

OCDQ Radio - The Blue Box of Information Quality

Quality is the Higgs Field of Data

Are you turning Ugly Data into Cute Information?

Big Data Lessons from Orbitz

The Graystone Effects of Big Data

Will Big Data be Blinded by Data Science?

Our Increasingly Data-Constructed World

OCDQ Radio - Data Quality and Big Data

Magic Elephants, Data Psychics, and Invisible Gorillas

HoardaBytes and the Big Data Lebowski

Big Data el Memorioso

Information Overload Revisited

The Big Data Collider

OCDQ Radio - Big Data and Big Analytics

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Sometimes it’s Okay to be Shallow

Big Data: Structure and Quality

The Big Data Theory

Swimming in Big Data

Why Can’t We Predict the Weather?

Tuesday
Jul172012

DQ-View: The Five Stages of Data Quality

Data Quality (DQ) View is an OCDQ regular segment. Each DQ-View is a brief video discussion of a data quality key concept.

In my experience, all organizations cycle through five stages while coming to terms with the daunting challenges of data quality, which are somewhat similar to The Five Stages of Grief.  So, in this short video, I explain The Five Stages of Data Quality:

  1. Denial — Our organization is well-managed and highly profitable.  We consistently meet, or exceed, our business goals.  We obviously understand the importance of high-quality data.  Data quality issues can’t possibly be happening to us.
  2. Anger — We’re now in the midst of a financial reporting scandal, and facing considerable fines in the wake of a regulatory compliance failure.  How can this be happening to us?  Why do we have data quality issues?  Who is to blame for this?
  3. Bargaining — Okay, we may have just overreacted a little bit.  We’ll purchase a data quality tool, approve a data cleansing project, implement defect prevention, and initiate data governance.  That will fix all of our data quality issues — right?
  4. Depression — Why, oh why, do we keep having data quality issues?  Why does this keep happening to us?  Maybe we should just give up, accept our doomed fate, and not bother doing anything at all about data quality and data governance.
  5. Acceptance — We can’t fight the truth anymore.  We accept that we have to do the hard daily work of continuously improving our data quality and continuously implementing our data governance principles, policies, and procedures.

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: DQ-View on Vimeo

You can also watch a regularly updated page of my videos by clicking on this link: OCDQ Videos

 

Related Posts

Posts related to the Denial Stage of Data Quality:

Data Quality and Chicken Little Syndrome

The Illusion-of-Quality Effect

Perception Filters and Data Quality

“Some is not a number and soon is not a time”

The Data Quality Wager

Posts related to the Anger Stage of Data Quality:

Jack Bauer and Enforcing Data Governance Policies

Beware the Data Governance Ides of March

Aristotle, Data Governance, and Lead Rulers

Why isn’t our data quality worse?

Don’t Do Less Bad; Do Better Good

Posts related to the Bargaining Stage of Data Quality:

Data Quality and Miracle Exceptions

Do you believe in Magic (Quadrants)?

Which came first, the Data Quality Tool or the Business Need?

The Technology Carousel

The Stakeholder’s Dilemma

Posts related to the Depression Stage of Data Quality:

There is No Such Thing as a Root Cause

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

Data Quality and the Bystander Effect

Posts related to the Acceptance Stage of Data Quality:

You only get a Return from something you actually Invest in

Data Governance Frameworks are like Jigsaw Puzzles

The HedgeFoxian Hypothesis

Finding Data Quality

Data Quality: Quo Vadimus?

Tuesday
Jul102012

Shining a Social Light on Data Quality

Last week, when I published my blog post Lightning Strikes the Cloud, I unintentionally demonstrated three important things about data quality.

The first thing I demonstrated was even an obsessive-compulsive data quality geek is capable of data defects, since I initially published the post with the title Lightening Strikes the Cloud, which is an excellent example of the difference between validity and accuracy caused by the Cupertino Effect, since although lightening is valid (i.e., a correctly spelled word), it isn’t contextually accurate.

The second thing I demonstrated was the value of shining a social light on data quality — the value of using collaborative tools like social media to crowd-source data quality improvements.  Thankfully, Julian Schwarzenbach quickly noticed my error on Twitter.  “Did you mean lightning?  The concept of lightening clouds could be worth exploring further,” Julian humorously tweeted.  “Might be interesting to consider what happens if the cloud gets so light that it floats away.”  To which I replied that if the cloud gets so light that it floats away, it could become Interstellar Computing or, as Julian suggested, the start of the Intergalactic Net, which I suppose is where we will eventually have to store all of that big data we keep hearing so much about these days.

The third thing I demonstrated was the potential dark side of data cleansing, since the only remaining trace of my data defect is a broken URL.  This is an example of not providing a well-documented audit trail, which is necessary within an organization to communicate data quality issues and resolutions.

Communication and collaboration are essential to finding our way with data quality.  And social media can help us by providing more immediate and expanded access to our collective knowledge, experience, and wisdom, and by shining a social light that illuminates the shadows cast upon data quality issues when a perception filter or bystander effect gets the better of our individual attention or undermines our collective best intentions — which, as I recently demonstrated, occasionally happens to all of us.

 

Related Posts

Data Quality and the Cupertino Effect

Are you turning Ugly Data into Cute Information?

The Importance of Envelopes

The Algebra of Collaboration

Finding Data Quality

The Wisdom of the Social Media Crowd

Perception Filters and Data Quality

Data Quality and the Bystander Effect

The Family Circus and Data Quality

Data Quality and the Q Test

Metadata, Data Quality, and the Stroop Test

The Three Most Important Letters in Data Governance

Sunday
Jun172012

The Family Circus and Data Quality

Like many young intellectuals, the only part of the Sunday newspaper I read growing up was the color comics section, and one of my favorite comic strips was The Family Circus created by cartoonist Bil Keane.  One of the recurring themes of the comic strip was a set of invisible gremlins that the children used to shift blame for any misdeeds, including Ida Know, Not Me, and Nobody.

Although I no longer read any section of the newspaper on any day of the week, this Sunday morning I have been contemplating how this same set of invisible gremlins is also used by many people throughout most organizations to shift blame for any incidents when poor data quality negatively impacted business activities, especially since, when investigating the root cause, you often find that Ida Know owns the dataNot Me is accountable for data governance, and Nobody takes responsibility for data quality.

 

Related Posts

The Third Law of Data Quality

The Data Governance Oratorio

Data Quality and the Bystander Effect

Shared Responsibility

The Algebra of Collaboration

Collaboration isn’t Brain Surgery

The Three Most Important Letters in Data Governance

Video: Declaration of Data Governance

The Collaborative Culture of Data Governance

The Role of Data Quality Monitoring in Data Governance

Thursday
May172012

The Data Quality Placebo

Inspired by a recent Boing Boing blog post

Are you suffering from persistent and annoying data quality issues?  Or are you suffering from the persistence of data quality tool vendors and consultants annoying you with sales pitches about how you must be suffering from persistent data quality issues?

Either way, the Data Division of Prescott Pharmaceuticals (trusted makers of gastroflux, datamine, selectium, and qualitol) is proud to present the perfect solution to all of your real and/or imaginary data quality issues — The Data Quality Placebo.

Simply take two capsules (made with an easy-to-swallow coating) every morning and you will be guaranteed to experience:

“Zero Defects with Zero Side Effects” TM

(Legal Disclaimer: Zero Defects with Zero Side Effects may be the result of Zero Testing, which itself is probably just a side effect of The Prescott Promise: “We can promise you that we will never test any of our products on animals because . . . we never test any of our products.”)

Tuesday
Feb142012

Data Love Song Mashup

Today is February 14 — Valentine’s Day — the annual celebration of enduring romance, where true love is publicly judged according to your willingness to purchase chocolate, roses, and extremely expensive jewelry, and privately judged in ways that nobody (and please, trust me when I say nobody) wants to see you post on Twitter, Facebook, YouTube, or your blog.

Valentine’s Day is for people in love to celebrate their love privately in whatever way works best for them.

But since your data needs love too, this blog post provides a mashup of love songs for your data.

 

Data Love Song Mashup

I’ve got sunshine on a cloud computing day
When it’s cold outside, I’ve got backups from the month of May
I guess you’d say, what can make me feel this way?
My data, my data, my data
Singing about my data
My data

My data’s so beautiful 
And I tell it every day
When I see your user interface
There’s not a thing that I would change
Because my data, you’re amazing
Just the way you are
You’re amazing data
Just the way you are

They say we’re young and we don’t know
We won’t find data quality issues until we grow
Well I don’t know if that is true
Because you got me, data
And data, I got you
I got you, data

Look into my eyes, and you will see
What my data means to me
Don’t tell me data quality is not worth trying for
Don’t tell me it’s not worth fighting for
You know it’s true
Everything I do, I do data quality for you

I can’t make you love data if you don’t
I can’t make your heart feel something it won’t

But there’s nothing you can do that can’t be done
Nothing you can sing that can’t be sung
Nothing you can make that can’t be made
All you need is love . . . for data
Love for data is all you need

Business people working hard all day and through the night
Their database queries searching for business insight
Some will win, some will lose
Some were born to sing the data quality blues
Oh, the need for business insight never ends
It goes on and on and on and on
Don’t stop believing
Hold on to that data loving feeling

Look at your data, I know its poor quality is showing
Look at your organization, you don’t know where it’s going
I don’t know much, but I know your data needs love too
And that may be all I need to know

Nothing compares to data quality, no worries or cares
Business regrets and decision mistakes, they’re memories made
But if you don’t continuously improve, how bittersweet that will taste
I wish nothing but the best for you
I wish nothing but the best for your data too
Don’t forget data quality, I beg, please remember I said
Sometimes quality lasts in data, but sometimes it hurts instead

 

Happy Valentine’s Day to you and yours

Happy Data Quality to you and your data

Related Posts

Maybe you’re just not that into your data?

I’m Bringing DQ Sexy Back

People

Over the Data Governance Rainbow

Council Data Governance

I’m Gonna Data Profile (500 Records)

A Record Named Duplicate

You Can’t Always Get the Data You Want

Data Quality is such a Rush

Data Quality Mondegreens

DQ-View: MetaData makes BettahMusic

The Data-Decision Symphony

Monday
Jan302012

HoardaBytes and the Big Data Lebowski

The recent #GartnerChat on Big Data was an excellent Twitter discussion about what I often refer to as the Seven Letter Tsunami of the data management industry, which as Gartner Research explains, although the term acknowledges the exponential growth, availability, and use of information in today’s data-rich landscape, big data is about more than just data volume.  Data variety (i.e., structured, semi-structured, and unstructured data, as well as other types, such as the sensor data emanating from the Internet of Things), and data velocity (i.e., how fast data is produced and how fast data must be processed to meet demand) are also key characteristics of the big challenges associated with the big buzzword that big data has become over the last year.

Since ours is an industry infatuated with buzzwords, Timo Elliott remarked “new terms arise because of new technology, not new business problems.  Big Data came from a need to name Hadoop [and other technologies now being relentlessly marketed as big data solutions], so anybody using big data to refer to business problems is quickly going to tie themselves in definitional knots.”

To which Mark Troester responded, “the hype of Hadoop is driving pressure on people to keep everything — but they ignore the difficulty in managing it.”  John Haddad then quipped that “big data is a hoarders dream,” which prompted Andy Bitterer to coin the term HoardaByte for measuring big data, and then asking, “Would the real Big Data Lebowski please stand up?”

 

HoardaBytes

Although it’s probably no surprise that a blogger with obsessive-compulsive in the title of his blog would like Bitterer’s new term, the fact is that whether you choose to measure it in terabytes, petabytes, exabytes, HoardaBytes, or how much reality bitterly bites, our organizations have been compulsively hoarding data for a long time.

And with silos replicating data as well as new data, and new types of data, being created and stored on a daily basis, managing all of the data is not only becoming impractical, but because we are too busy with the activity of trying to manage all of it, we are hoarding countless bytes of data without evaluating data usage, gathering data requirements, or planning for data archival.

 

The Big Data Lebowski

In The Big Lebowski, Jeff Lebowski (“The Dude”) is, in a classic data quality blunder caused by matching on person name only, mistakenly identified as millionaire Jeffrey Lebowski (“The Big Lebowski”) in an eccentric plot expected from a Coen brothers film, which, since its release in the late 1990s, has become a cult classic and inspired a religious following known as Dudeism.

Historically, a big part of the problem in our industry has been the fact that the word “data” is prevalent in the names we have given industry disciplines and enterprise information initiatives.  For example, data architecture, data quality, data integration, data migration, data warehousing, master data management, and data governance — to name but a few.

However, all this achieved was to perpetuate the mistaken identification of data management as an esoteric technical activity that played little more than a minor, supporting, and often uncredited, role within the business activities of our organizations.

But since the late 1990s, there has been a shift in the perception of data.  The real data deluge has not been the rising volume, variety, and velocity of data, but instead the rising awareness of the big impact that data has on nearly every aspect of our professional and personal lives.  In this brave new data world, companies like Google and Facebook have built business empires mostly out of our own personal data, which is why, like it or not, as individuals, we must accept that we are all data geeks now.

All of the hype about Big Data is missing the point.  The reality is that Data is Big — meaning that data has now so thoroughly pervaded mainstream culture that data has gone beyond being just a cult classic for the data management profession, and is now inspiring an almost religious following that we could call Dataism.

 

The Data must Abide

“The Dude abides.  I don’t know about you, but I take comfort in that,” remarked The Stranger in The Big Lebowski.

The Data must also abide.  And the Data must abide both the Business and the Individual.  The Data abides the Business if data proves useful to our business activities.  The Data abides the Individual if data protects the privacy of our personal activities.

The Data abides.  I don’t know about you, but I would take more comfort in that than in any solutions The Stranger Salesperson wants to sell me that utilize an eccentric sales pitch involving HoardaBytes and the Big Data Lebowski.

 

Related Posts

Big Data el Memorioso

Dot Collectors and Dot Connectors

OCDQ Radio - Big Data and Big Analytics

DQ-View: Data Is as Data Does

OCDQ Radio - So Long 2011, and Thanks for All the . . .

Neither the I Nor the T is Magic

Information Overload Revisited

The Big Data Collider

The Speed of Decision

The Data-Decision Symphony

A Decision Needle in a Data Haystack

OCDQ Radio - Good-Enough Data for Fast-Enough Decisions

Friday
Jan132012

Scary Calendar Effects

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, recorded on the first of three occurrences of Friday the 13th in 2012, I discuss scary calendar effects.

In other words, I discuss how schedules, deadlines, and other date-related aspects can negatively affect enterprise initiatives such as data quality, master data management, and data governance.

Please Beware: This episode concludes with the OCDQ Radio Theater production of Data Quality and Friday the 13th.

 

Scary Calendar Effects

Additional listening options:

 

Related Posts

Data Quality and #FollowFriday the 13th

The Moirae, Deadlines and Working within Limits

The Fiscal Calendar Effect

Eternal September and Tacit Knowledge

“What is is the was of what shall be”

 

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

 

Thursday
Dec292011

So Long 2011, and Thanks for All the . . .

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Don’t Panic!  Welcome to the mostly harmless OCDQ Radio 2011 Year in Review episode.  During this approximately 42 minute episode, I recap the data-related highlights of 2011 in a series of sometimes serious, sometimes funny, segments, as well as make wacky and wildly inaccurate data-related predictions about 2012.

Special thanks to my guests Jarrett Goldfedder, who discusses Big Data, Nicola Askham, who discusses Data Governance, and Daragh O Brien, who discusses Data Privacy.  Additional thanks to Rich Murnane and Dylan Jones.  And Deep Thanks to that frood Douglas Adams, who always knew where his towel was, and who wrote The Hitchhiker’s Guide to the Galaxy.

 

So Long 2011, and Thanks for All the . . .

Additional listening options:

 

Previous OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Friday
May132011

Data Quality and #FollowFriday the 13th

As Alice Hardy arrived at her desk at Crystal Lake Insurance, it seemed like a normal Friday morning.  Her thoughts about her weekend camping trip were interrupted by an eerie sound emanating from one of the adjacent cubicles:

Da da da, ta ta ta.  Da da da, ta ta ta.

“What’s that sound?” Alice wondered out loud.

“Sorry, am I typing too loud again?” responded Tommy Jarvis from another adjacent cubicle.  “Can you come take a look at something for me?”

“Sure, I’ll be right over,” Alice replied as she quickly circumnavigated their cluster of cubicles, puzzled and unsettled to find the other desks unoccupied with their computers turned off, wondering, to herself this time, where did that eerie sound come from?  Where are the other data counselors today?

“What’s up?” she casually asked upon entering Tommy’s cubicle, trying, as always, to conceal her discomfort about being alone in the office with the one colleague that always gave her the creeps.  Visiting his cubicle required a constant vigilance in order to avoid making prolonged eye contact, not only with Tommy Jarvis, but also with the horrifying hockey mask hanging above his computer screen like some possessed demon spawn from a horror movie.

“I’m analyzing the Date of Death in the life insurance database,” Tommy explained.  “And I’m receiving really strange results.  First of all, there are no NULLs, which indicates all of our policyholders are dead, right?  And if that wasn’t weird enough, there are only 12 unique values: January 13, 1978, February 13, 1981, March 13, 1987, April 13, 1990, May 13, 2011, June 13, 1997, July 13, 2001, August 13, 1971, September 13, 2002, October 13, 2006, November 13, 2009, and December 13, 1985.”

“That is strange,” said Alice.  “All of our policyholders can’t be dead.  And why is Date of Death always the 13th of the month?”

“It’s not just always the 13th of the month,” Tommy responded, almost cheerily.  “It’s always a Friday the 13th.”

“Well,” Alice slowly, and nervously, replied.  “I have a life insurance policy with Crystal Lake Insurance.  Pull up my policy.”

After a few, quick, loud pounding keystrokes, Tommy ominously read aloud the results now displaying on his computer screen, just below the hockey mask that Alice could swear was staring at her.  “Date of Death: May 13, 2011 . . . Wait, isn’t that today?”

Da da da, ta ta ta.  Da da da, ta ta ta.

“Did you hear that?” asked Alice.  “Hear what?” responded Tommy with a devilish grin.

“Never mind,” replied Alice quickly while trying to focus her attention on only the computer screen.  “Are you sure you pulled up the right policy?  I don’t recognize the name of the Primary Beneficiary . . . Who the hell is Jason Voorhees?”

“How the hell could you not know who Jason Voorhees is?” asked Tommy, with anger sharply crackling throughout his words.  “Jason Voorhees is now rightfully the sole beneficiary of every life insurance policy ever issued by Crystal Lake Insurance.”

Da da da, ta ta ta.  Da da da, ta ta ta.

“What?  That’s impossible!” Alice screamed.  “This has to be some kind of sick data quality joke.”

“It’s a data quality masterpiece!” Tommy retorted with rage.  “I just finished implementing my data machete, er I mean, my data matching solution.  From now on, Crystal Lake Insurance will never experience another data quality issue.”

“There’s just one last thing that I need to take care of.”

Da da da, ta ta ta.  Da da da, ta ta ta.

“And what’s that?” Alice asked, smiling nervously while quickly backing away into the hallway—and preparing to run for her life.

Da da da, ta ta ta.  Da da da, ta ta ta.

“Real-world alignment,” replied Tommy.  Rising to his feet, he put on the hockey mask, and pulled an actual machete out of the bottom drawer of his desk.  “Your Date of Death is entered as May 13, 2011.  Therefore, I must ensure real-world alignment.”

Da da da, ta ta ta.  Da da da, ta ta ta.  Da da da, ta ta ta.  Da da da, ta ta ta.  Data Quality.

The End.

 

(Note — You can also listen to the OCDQ Radio Theater production of this DQ-Tale in the Scary Calendar Effects episode.)

 

#FollowFriday Recommendations

Today is #FollowFriday, the day when Twitter users recommend other users you should follow, so here are some great tweeps who provide non-horrifying tweets about Data Quality, Data Governance, Master Data Management, and Business Intelligence:

(Please Note: This is by no means a comprehensive list, is listed in no particular order whatsoever, and no offense is intended to any of my tweeps not listed below.  I hope that everyone has a great #FollowFriday and an even greater weekend.)

 

Related Posts

Spartan Data Quality

Pirates of the Computer: The Curse of the Poor Data Quality

The Tell-Tale Data

Data Quality is People!

Twitter, Data Governance, and a #ButteredCat #FollowFriday

#FollowFriday Spotlight: @PhilSimon

#FollowFriday Spotlight: @hlsdk

#FollowFriday Spotlight: @DataQualityPro

#FollowFriday and The Three Tweets

Dilbert, Data Quality, Rabbits, and #FollowFriday

Twitter, Meaningful Conversations, and #FollowFriday

The Fellowship of #FollowFriday

Tuesday
Mar222011

Retroactive Data Quality

As I, and many others, have blogged about many times before, the proactive approach to data quality, i.e., defect prevention, is highly recommended over the reactive approach to data quality, i.e., data cleansing.

However, reactive data quality still remains the most common approach because “let’s wait and see if something bad happens” is typically much easier to sell strategically than “let’s try to predict the future by preventing something bad before it happens.”

Of course, when something bad does happen (and it always does), it is often too late to do anything about it.  So imagine if we could somehow travel back in time and prevent specific business-impacting occurrences of poor data quality from happening.

This would appear to be the best of both worlds since we could reactively wait and see if something bad happens, and if (when) it does, then we could travel back in time and proactively prevent just that particular bad thing from happening to our data quality.

This approach is known as Retroactive Data Quality—and it has been (somewhat successfully) implemented at least three times.

 

Flux Capacitated Data Quality

In 1985, Dr. Emmett “Doc” Brown turned a modified DeLorean DMC-12 into a retroactive data quality machine that when accelerated to 88 miles per hour, created a time displacement window using its flux capacitor (according to Doc it’s what makes time travel possible) powered by 1.21 gigawatts of electricity, which could be provided by either a nuclear reaction or a lightning strike.

On October 25, 1985, Doc sent data quality expert Marty McFly back in time to November 5, 1955 to prevent a few data defects in the original design of the flux capacitor, which inadvertently triggers some severe data defects in 2015, requiring Doc and Marty to travel back to 1955, then 1885, before traveling Back to the Future of a defect-free 1985—when the flux capacitor is destroyed.

 

Quantum Data Quality

In 1989, theorizing a data steward could time travel within his own database, Dr. Sam Beckett launched a retroactive data quality project called Quantum Data Quality, stepped into its Quantum Leap data accelerator—and vanished.

He awoke to find himself trapped in the past, stewarding data that was not his own, and driven by an unknown force to change data quality for the better.  His only guide on this journey was Al, a subject matter expert from his own time, who appeared in the form of a hologram only Sam could see and hear.  And so, Dr. Beckett found himself leaping from database to database, putting data right that once went wrong, and hoping each time that his next leap would be the leap home to his own database—but Sam never returned home.

 

Data Quality Slingshot Effect

The slingshot effect is caused by traveling in a starship at an extremely high warp factor toward a sun.  After allowing the gravitational pull to accelerate it to even faster speeds, the starship will then break away from the sun, which creates the so-called slingshot effect that transports the starship through time.

In 2267, Captain Gene Roddenberry will begin a Star Trek, commanding a starship using the slingshot effect to travel back in time to September 8, 1966 to launch a retroactive data quality initiative that has the following charter:

“Data: the final frontier.  These are the voyages of the starship Quality.  Its continuing mission: To explore strange, new databases; To seek out new data and new corporations; To boldly go where no data quality has gone before.”

 

Retroactive Data Quality Log, Supplemental

It is understandable if many of you doubt the viability of time travel as an approach to improving your data quality.  After all, whenever Doc and Marty, or Sam and Al, or Captain Roddenberry and the crew of the starship Quality, travel back in time and prevent specific business-impacting occurrences of poor data quality from happening, how do we prove they were successful?  Within the resulting altered timeline, there would be no traces of the data quality issues after they were retroactively resolved.

“Great Scott!”  It will always be more difficult to sell the business benefits of defect prevention, than the relative ease of selling data cleansing after a CxO responds “Oh, boy!” after the next time poor data quality negatively impacts business performance.

Nonetheless, you must continue your mission to engage your organization in a proactive approach to data quality.  “Make It So!”

 

Related Posts

Groundhog Data Quality Day

What Data Quality Technology Wants

To Our Data Perfectionists

Finding Data Quality

MacGyver: Data Governance and Duct Tape

What going to the dentist taught me about data quality

Microwavable Data Quality

A Tale of Two Q’s

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

Friday
Mar112011

Twitter, Data Governance, and a #ButteredCat #FollowFriday

I have previously blogged in defense of Twitter, the pithy platform for social networking that I use perhaps a bit too frequently, and about which many people argue is incompatible with meaningful communication (Twitter that is, not me—hopefully).

Whether it is a regularly scheduled meeting of the minds, like the Data Knights Tweet Jam, or simply a spontaneous supply of trenchant thoughts, Twitter quite often facilitates discussions that deliver practical knowledge or thought-provoking theories.

However, occasionally the discussions center around more curious concepts, such as a paradox involving a buttered cat, which thankfully Steve Sarsfield, Mark Horseman, and Daragh O Brien can help me attempt to explain (remember I said attempt):

So, basically . . . successful data governance is all about Buttered Cats, Breaded CxOs, and Beer-Battered Data Quality Managers working together to deliver Bettered Data to the organization . . . yeah, that all sounded perfectly understandable to me.

But just in case you don’t have your secret decoder ring, let’s decipher the message (remember: “Be sure to drink your Ovaltine”):

  • Buttered Cats – metaphor for combining the top-down and bottom-up approaches to data governance
  • Breaded CxOs – metaphor for executive sponsors, especially ones providing bread (i.e., funding, not lunch—maybe both)
  • Beer-Battered Data Quality Managers – metaphor (and possibly also a recipe) for data stewardship
  • Bettered Data – metaphor for the corporate asset thingy that data governance helps you manage

(For more slightly less cryptic information, check out my previous post/poll: Data Governance and the Buttered Cat Paradox)

 

#FollowFriday Recommendations

Today is #FollowFriday, the day when Twitter users recommend other users you should follow, so here are some great tweeps for mostly non-buttered-cat tweets about Data Quality, Data Governance, Master Data Management, and Business Intelligence:

(Please Note: This is by no means a comprehensive list, is listed in no particular order whatsoever, and no offense is intended to any of my tweeps not listed below.  I hope that everyone has a great #FollowFriday and an even greater weekend.)

 

Related Posts

#FollowFriday Spotlight: @PhilSimon

#FollowFriday Spotlight: @hlsdk

#FollowFriday Spotlight: @DataQualityPro

#FollowFriday and The Three Tweets

Dilbert, Data Quality, Rabbits, and #FollowFriday

Twitter, Meaningful Conversations, and #FollowFriday

The Fellowship of #FollowFriday

The Wisdom of the Social Media Crowd

Social Karma (Part 7) – Twitter