Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments

Entries in Humor (64)

Monday
Jan302012

HoardaBytes and the Big Data Lebowski

The recent #GartnerChat on Big Data was an excellent Twitter discussion about what I often refer to as the Seven Letter Tsunami of the data management industry, which as Gartner Research explains, although the term acknowledges the exponential growth, availability, and use of information in today’s data-rich landscape, big data is about more than just data volume.  Data variety (i.e., structured, semi-structured, and unstructured data, as well as other types, such as the sensor data emanating from the Internet of Things), and data velocity (i.e., how fast data is produced and how fast data must be processed to meet demand) are also key characteristics of the big challenges associated with the big buzzword that big data has become over the last year.

Since ours is an industry infatuated with buzzwords, Timo Elliott remarked “new terms arise because of new technology, not new business problems.  Big Data came from a need to name Hadoop [and other technologies now being relentlessly marketed as big data solutions], so anybody using big data to refer to business problems is quickly going to tie themselves in definitional knots.”

To which Mark Troester responded, “the hype of Hadoop is driving pressure on people to keep everything — but they ignore the difficulty in managing it.”  John Haddad then quipped that “big data is a hoarders dream,” which prompted Andy Bitterer to coin the term HoardaByte for measuring big data, and then asking, “Would the real Big Data Lebowski please stand up?”

 

HoardaBytes

Although it’s probably no surprise that a blogger with obsessive-compulsive in the title of his blog would like Bitterer’s new term, the fact is that whether you choose to measure it in terabytes, petabytes, exabytes, HoardaBytes, or how much reality bitterly bites, our organizations have been compulsively hoarding data for a long time.

And with silos replicating data as well as new data, and new types of data, being created and stored on a daily basis, managing all of the data is not only becoming impractical, but because we are too busy with the activity of trying to manage all of it, we are hoarding countless bytes of data without evaluating data usage, gathering data requirements, or planning for data archival.

 

The Big Data Lebowski

In The Big Lebowski, Jeff Lebowski (“The Dude”) is, in a classic data quality blunder caused by matching on person name only, mistakenly identified as millionaire Jeffrey Lebowski (“The Big Lebowski”) in an eccentric plot expected from a Coen brothers film, which, since its release in the late 1990s, has become a cult classic and inspired a religious following known as Dudeism.

Historically, a big part of the problem in our industry has been the fact that the word “data” is prevalent in the names we have given industry disciplines and enterprise information initiatives.  For example, data architecture, data quality, data integration, data migration, data warehousing, master data management, and data governance — to name but a few.

However, all this achieved was to perpetuate the mistaken identification of data management as an esoteric technical activity that played little more than a minor, supporting, and often uncredited, role within the business activities of our organizations.

But since the late 1990s, there has been a shift in the perception of data.  The real data deluge has not been the rising volume, variety, and velocity of data, but instead the rising awareness of the big impact that data has on nearly every aspect of our professional and personal lives.  In this brave new data world, companies like Google and Facebook have built business empires mostly out of our own personal data, which is why, like it or not, as individuals, we must accept that we are all data geeks now.

All of the hype about Big Data is missing the point.  The reality is that Data is Big — meaning that data has now so thoroughly pervaded mainstream culture that data has gone beyond being just a cult classic for the data management profession, and is now inspiring an almost religious following that we could call Dataism.

 

The Data must Abide

“The Dude abides.  I don’t know about you, but I take comfort in that,” remarked The Stranger in The Big Lebowski.

The Data must also abide.  And the Data must abide both the Business and the Individual.  The Data abides the Business if data proves useful to our business activities.  The Data abides the Individual if data protects the privacy of our personal activities.

The Data abides.  I don’t know about you, but I would take more comfort in that than in any solutions The Stranger Salesperson wants to sell me that utilize an eccentric sales pitch involving HoardaBytes and the Big Data Lebowski.

 

Related Posts

Big Data el Memorioso

Dot Collectors and Dot Connectors

OCDQ Radio - Big Data and Big Analytics

DQ-View: Data Is as Data Does

OCDQ Radio - So Long 2011, and Thanks for All the . . .

Neither the I Nor the T is Magic

Information Overload Revisited

The Big Data Collider

The Speed of Decision

The Data-Decision Symphony

A Decision Needle in a Data Haystack

OCDQ Radio - Good-Enough Data for Fast-Enough Decisions

Friday
Jan132012

Scary Calendar Effects

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, recorded on the first of three occurrences of Friday the 13th in 2012, I discuss scary calendar effects.

In other words, I discuss how schedules, deadlines, and other date-related aspects can negatively affect enterprise initiatives such as data quality, master data management, and data governance.

Please Beware: This episode concludes with the OCDQ Radio Theater production of Data Quality and Friday the 13th.

 

Scary Calendar Effects

Additional listening options:

 

Related Posts

Data Quality and #FollowFriday the 13th

The Moirae, Deadlines and Working within Limits

The Fiscal Calendar Effect

Eternal September and Tacit Knowledge

“What is is the was of what shall be”

 

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

 

Thursday
Dec292011

So Long 2011, and Thanks for All the . . .

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Don’t Panic!  Welcome to the mostly harmless OCDQ Radio 2011 Year in Review episode.  During this approximately 42 minute episode, I recap the data-related highlights of 2011 in a series of sometimes serious, sometimes funny, segments, as well as make wacky and wildly inaccurate data-related predictions about 2012.

Special thanks to my guests Jarrett Goldfedder, who discusses Big Data, Nicola Askham, who discusses Data Governance, and Daragh O Brien, who discusses Data Privacy.  Additional thanks to Rich Murnane and Dylan Jones.  And Deep Thanks to that frood Douglas Adams, who always knew where his towel was, and who wrote The Hitchhiker’s Guide to the Galaxy.

 

So Long 2011, and Thanks for All the . . .

Additional listening options:

 

Previous OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Friday
May132011

Data Quality and #FollowFriday the 13th

As Alice Hardy arrived at her desk at Crystal Lake Insurance, it seemed like a normal Friday morning.  Her thoughts about her weekend camping trip were interrupted by an eerie sound emanating from one of the adjacent cubicles:

Da da da, ta ta ta.  Da da da, ta ta ta.

“What’s that sound?” Alice wondered out loud.

“Sorry, am I typing too loud again?” responded Tommy Jarvis from another adjacent cubicle.  “Can you come take a look at something for me?”

“Sure, I’ll be right over,” Alice replied as she quickly circumnavigated their cluster of cubicles, puzzled and unsettled to find the other desks unoccupied with their computers turned off, wondering, to herself this time, where did that eerie sound come from?  Where are the other data counselors today?

“What’s up?” she casually asked upon entering Tommy’s cubicle, trying, as always, to conceal her discomfort about being alone in the office with the one colleague that always gave her the creeps.  Visiting his cubicle required a constant vigilance in order to avoid making prolonged eye contact, not only with Tommy Jarvis, but also with the horrifying hockey mask hanging above his computer screen like some possessed demon spawn from a horror movie.

“I’m analyzing the Date of Death in the life insurance database,” Tommy explained.  “And I’m receiving really strange results.  First of all, there are no NULLs, which indicates all of our policyholders are dead, right?  And if that wasn’t weird enough, there are only 12 unique values: January 13, 1978, February 13, 1981, March 13, 1987, April 13, 1990, May 13, 2011, June 13, 1997, July 13, 2001, August 13, 1971, September 13, 2002, October 13, 2006, November 13, 2009, and December 13, 1985.”

“That is strange,” said Alice.  “All of our policyholders can’t be dead.  And why is Date of Death always the 13th of the month?”

“It’s not just always the 13th of the month,” Tommy responded, almost cheerily.  “It’s always a Friday the 13th.”

“Well,” Alice slowly, and nervously, replied.  “I have a life insurance policy with Crystal Lake Insurance.  Pull up my policy.”

After a few, quick, loud pounding keystrokes, Tommy ominously read aloud the results now displaying on his computer screen, just below the hockey mask that Alice could swear was staring at her.  “Date of Death: May 13, 2011 . . . Wait, isn’t that today?”

Da da da, ta ta ta.  Da da da, ta ta ta.

“Did you hear that?” asked Alice.  “Hear what?” responded Tommy with a devilish grin.

“Never mind,” replied Alice quickly while trying to focus her attention on only the computer screen.  “Are you sure you pulled up the right policy?  I don’t recognize the name of the Primary Beneficiary . . . Who the hell is Jason Voorhees?”

“How the hell could you not know who Jason Voorhees is?” asked Tommy, with anger sharply crackling throughout his words.  “Jason Voorhees is now rightfully the sole beneficiary of every life insurance policy ever issued by Crystal Lake Insurance.”

Da da da, ta ta ta.  Da da da, ta ta ta.

“What?  That’s impossible!” Alice screamed.  “This has to be some kind of sick data quality joke.”

“It’s a data quality masterpiece!” Tommy retorted with rage.  “I just finished implementing my data machete, er I mean, my data matching solution.  From now on, Crystal Lake Insurance will never experience another data quality issue.”

“There’s just one last thing that I need to take care of.”

Da da da, ta ta ta.  Da da da, ta ta ta.

“And what’s that?” Alice asked, smiling nervously while quickly backing away into the hallway—and preparing to run for her life.

Da da da, ta ta ta.  Da da da, ta ta ta.

“Real-world alignment,” replied Tommy.  Rising to his feet, he put on the hockey mask, and pulled an actual machete out of the bottom drawer of his desk.  “Your Date of Death is entered as May 13, 2011.  Therefore, I must ensure real-world alignment.”

Da da da, ta ta ta.  Da da da, ta ta ta.  Da da da, ta ta ta.  Da da da, ta ta ta.  Da da da, ta ta ta.  Data Quality.

The End.

(Or will it be continued on Friday, January 13, 2012?)

 

#FollowFriday Recommendations

Today is #FollowFriday, the day when Twitter users recommend other users you should follow, so here are some great tweeps who provide non-horrifying tweets about Data Quality, Data Governance, Master Data Management, and Business Intelligence:

(Please Note: This is by no means a comprehensive list, is listed in no particular order whatsoever, and no offense is intended to any of my tweeps not listed below.  I hope that everyone has a great #FollowFriday and an even greater weekend.)

 

Related Posts

Spartan Data Quality

Pirates of the Computer: The Curse of the Poor Data Quality

The Tell-Tale Data

Data Quality is People!

Twitter, Data Governance, and a #ButteredCat #FollowFriday

#FollowFriday Spotlight: @PhilSimon

#FollowFriday Spotlight: @hlsdk

#FollowFriday Spotlight: @DataQualityPro

#FollowFriday and The Three Tweets

Dilbert, Data Quality, Rabbits, and #FollowFriday

Twitter, Meaningful Conversations, and #FollowFriday

The Fellowship of #FollowFriday

Tuesday
Mar222011

Retroactive Data Quality

As I, and many others, have blogged about many times before, the proactive approach to data quality, i.e., defect prevention, is highly recommended over the reactive approach to data quality, i.e., data cleansing.

However, reactive data quality still remains the most common approach because “let’s wait and see if something bad happens” is typically much easier to sell strategically than “let’s try to predict the future by preventing something bad before it happens.”

Of course, when something bad does happen (and it always does), it is often too late to do anything about it.  So imagine if we could somehow travel back in time and prevent specific business-impacting occurrences of poor data quality from happening.

This would appear to be the best of both worlds since we could reactively wait and see if something bad happens, and if (when) it does, then we could travel back in time and proactively prevent just that particular bad thing from happening to our data quality.

This approach is known as Retroactive Data Quality—and it has been (somewhat successfully) implemented at least three times.

 

Flux Capacitated Data Quality

In 1985, Dr. Emmett “Doc” Brown turned a modified DeLorean DMC-12 into a retroactive data quality machine that when accelerated to 88 miles per hour, created a time displacement window using its flux capacitor (according to Doc it’s what makes time travel possible) powered by 1.21 gigawatts of electricity, which could be provided by either a nuclear reaction or a lightning strike.

On October 25, 1985, Doc sent data quality expert Marty McFly back in time to November 5, 1955 to prevent a few data defects in the original design of the flux capacitor, which inadvertently triggers some severe data defects in 2015, requiring Doc and Marty to travel back to 1955, then 1885, before traveling Back to the Future of a defect-free 1985—when the flux capacitor is destroyed.

 

Quantum Data Quality

In 1989, theorizing a data steward could time travel within his own database, Dr. Sam Beckett launched a retroactive data quality project called Quantum Data Quality, stepped into its Quantum Leap data accelerator—and vanished.

He awoke to find himself trapped in the past, stewarding data that was not his own, and driven by an unknown force to change data quality for the better.  His only guide on this journey was Al, a subject matter expert from his own time, who appeared in the form of a hologram only Sam could see and hear.  And so, Dr. Beckett found himself leaping from database to database, putting data right that once went wrong, and hoping each time that his next leap would be the leap home to his own database—but Sam never returned home.

 

Data Quality Slingshot Effect

The slingshot effect is caused by traveling in a starship at an extremely high warp factor toward a sun.  After allowing the gravitational pull to accelerate it to even faster speeds, the starship will then break away from the sun, which creates the so-called slingshot effect that transports the starship through time.

In 2267, Captain Gene Roddenberry will begin a Star Trek, commanding a starship using the slingshot effect to travel back in time to September 8, 1966 to launch a retroactive data quality initiative that has the following charter:

“Data: the final frontier.  These are the voyages of the starship Quality.  Its continuing mission: To explore strange, new databases; To seek out new data and new corporations; To boldly go where no data quality has gone before.”

 

Retroactive Data Quality Log, Supplemental

It is understandable if many of you doubt the viability of time travel as an approach to improving your data quality.  After all, whenever Doc and Marty, or Sam and Al, or Captain Roddenberry and the crew of the starship Quality, travel back in time and prevent specific business-impacting occurrences of poor data quality from happening, how do we prove they were successful?  Within the resulting altered timeline, there would be no traces of the data quality issues after they were retroactively resolved.

“Great Scott!”  It will always be more difficult to sell the business benefits of defect prevention, than the relative ease of selling data cleansing after a CxO responds “Oh, boy!” after the next time poor data quality negatively impacts business performance.

Nonetheless, you must continue your mission to engage your organization in a proactive approach to data quality.  “Make It So!”

 

Related Posts

Groundhog Data Quality Day

What Data Quality Technology Wants

To Our Data Perfectionists

Finding Data Quality

MacGyver: Data Governance and Duct Tape

What going to the dentist taught me about data quality

Microwavable Data Quality

A Tale of Two Q’s

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

Friday
Mar112011

Twitter, Data Governance, and a #ButteredCat #FollowFriday

I have previously blogged in defense of Twitter, the pithy platform for social networking that I use perhaps a bit too frequently, and about which many people argue is incompatible with meaningful communication (Twitter that is, not me—hopefully).

Whether it is a regularly scheduled meeting of the minds, like the Data Knights Tweet Jam, or simply a spontaneous supply of trenchant thoughts, Twitter quite often facilitates discussions that deliver practical knowledge or thought-provoking theories.

However, occasionally the discussions center around more curious concepts, such as a paradox involving a buttered cat, which thankfully Steve Sarsfield, Mark Horseman, and Daragh O Brien can help me attempt to explain (remember I said attempt):

So, basically . . . successful data governance is all about Buttered Cats, Breaded CxOs, and Beer-Battered Data Quality Managers working together to deliver Bettered Data to the organization . . . yeah, that all sounded perfectly understandable to me.

But just in case you don’t have your secret decoder ring, let’s decipher the message (remember: “Be sure to drink your Ovaltine”):

  • Buttered Cats – metaphor for combining the top-down and bottom-up approaches to data governance
  • Breaded CxOs – metaphor for executive sponsors, especially ones providing bread (i.e., funding, not lunch—maybe both)
  • Beer-Battered Data Quality Managers – metaphor (and possibly also a recipe) for data stewardship
  • Bettered Data – metaphor for the corporate asset thingy that data governance helps you manage

(For more slightly less cryptic information, check out my previous post/poll: Data Governance and the Buttered Cat Paradox)

 

#FollowFriday Recommendations

Today is #FollowFriday, the day when Twitter users recommend other users you should follow, so here are some great tweeps for mostly non-buttered-cat tweets about Data Quality, Data Governance, Master Data Management, and Business Intelligence:

(Please Note: This is by no means a comprehensive list, is listed in no particular order whatsoever, and no offense is intended to any of my tweeps not listed below.  I hope that everyone has a great #FollowFriday and an even greater weekend.)

 

Related Posts

#FollowFriday Spotlight: @PhilSimon

#FollowFriday Spotlight: @hlsdk

#FollowFriday Spotlight: @DataQualityPro

#FollowFriday and The Three Tweets

Dilbert, Data Quality, Rabbits, and #FollowFriday

Twitter, Meaningful Conversations, and #FollowFriday

The Fellowship of #FollowFriday

The Wisdom of the Social Media Crowd

Social Karma (Part 7) – Twitter

Saturday
Feb122011

Spartan Data Quality

My recent Twitter conservation with Dylan Jones, Henrik Liliendahl Sørensen, and Daragh O Brien was sparked by the blog post Case study with Data blogs, from 300 to 1000, which included a list of the top 500 data blogs ranked by influence.

Data Quality Pro was ranked #57, Liliendahl on Data Quality was ranked #87, The DOBlog was a glaring omission, and I was proud OCDQ Blog was ranked #33 – at least until, being the data quality geeks we are, we noticed that it was also ranked #165.

In other words, there was an ironic data quality issue—a data quality blog was listed twice (i.e., a duplicate record in the list)!

Hilarity ensued, including some epic photo shopping by Daragh, leading, quite inevitably, to the writing of this Data Quality Tale, which is obviously loosely based on the epic movie 300—and perhaps also the epically terrible comedy Meet the Spartans.  Enjoy!

 

Spartan Data Quality

In 1989, an alliance of Data Geeks, lead by the Spartans, an unrivaled group of data quality warriors, battled against an invading data deluge in the mountain data center of Thermopylae, caused by the complexities of the Greco-Persian Corporate Merger.

Although they were vastly outnumbered, the Data Geeks overcame epic data quality challenges in one of the most famous enterprise data management initiatives in history—The Data Integration of Thermopylae.

This is their story.

Leonidas, leader of the Spartans, espoused an enterprise data management approach known as Spartan Data Quality, defined by its ethos of collaboration amongst business, data, and technology experts, collectively and affectionately known as Data Geeks.

Therefore, Leonidas was chosen as the Thermopylae Project Lead.  However, Xerxes, the new Greco-Persian CIO, believed that the data integration project was pointless, Spartan Data Quality was a fool’s errand, and the technology-only Persian approach, known as Magic Beans, should be implemented instead.  Xerxes saw the Thermopylae project as an unnecessary sacrifice.

“There will be no glory in your sacrifice,” explained Xerxes.  “I will erase even the memory of Sparta from the database log files!  Every bit and byte of Data Geek tablespace shall be purged.  Every data quality historian and every data blogger shall have their Ethernet cables pulled out, and their network connections cut from the Greco-Persian mainframe.  Why, uttering the very name of Sparta, or Leonidas, will be punishable by employee termination!  The corporate world will never know you existed at all!”

“The corporate world will know,” replied Leonidas, “that Data Geeks stood against a data deluge, that few stood against many, and before this battle was over, a CIO blinded by technology saw what it truly takes to manage data as a corporate asset.”

Addressing his small army of 300 Data Geeks, Leonidas declared: “Gather round!  No retreat, no surrender.  That is Spartan law.  And by Spartan law we will stand and fight.  And together, united by our collaboration, our communication, our transparency, and our trust in each other, we shall overcome this challenge.”

“A new Information Age has begun.  An age of data-driven business decisions, an age of data-empowered consumers, an age of a world connected by a web of linked data.  And all will know, that 300 Data Geeks gave their last breath to defend it!”

“But there will be so many data defects, they will blot out the sun!” exclaimed Xerxes.

“Then we will fight poor data quality in the shade,” Leonidas replied, with a sly smile.

“This is madness!” Xerxes nervously responded as the new servers came on-line in the data center of Thermopylae.

“Madness?  No,” Leonidas calmly said as the first wave of the data deluge descended upon them.  “THIS . . . IS . . . DATA !!!”

 

Related Posts

Pirates of the Computer: The Curse of the Poor Data Quality

Video: Oh, the Data You’ll Show!

The Quest for the Golden Copy (Part 1)

The Quest for the Golden Copy (Part 2)

The Quest for the Golden Copy (Part 3)

The Quest for the Golden Copy (Part 4)

‘Twas Two Weeks Before Christmas

My Own Private Data

The Tell-Tale Data

Data Quality is People!

Tuesday
Feb082011

DQ-BE: Dear Valued Customer

Data Quality By Example (DQ-BE) is an OCDQ regular segment that provides examples of data quality key concepts.

The term “valued customer” is bandied about quite frequently and is often at the heart of enterprise data management initiatives such as Customer Data Integration (CDI), 360° Customer View, and Customer Master Data Management (MDM).

The role of data quality in these initiatives is an important, but sometimes mistakenly overlooked, consideration.

For example, the Service Contract Renewal Notice (shown above) I recently received exemplifies the impact of poor data quality on Customer Relationship Management (CRM) since one of my service providers wants me—as a valued customer—to purchase a new service contract for one of my laptop computers.

Let’s give them props for generating a 100% accurate residential postal address, since how could I even consider renewing my service contract if I don’t receive the renewal notice in the mail?  Let’s also acknowledge my Customer ID is also 100% accurate, since that is the “unique identifier” under which I have purchased all of my products and services from this company.

However, the biggest data quality mistake is that the name of their “Valued Customer” is not INDEPENDENT CONSULTANT.  (And they get bonus negative points for writing it in ALL CAPS).

The moral of the story is that if you truly value your customers, then you should truly value your customer data quality.

At the very least—get your customer’s name right.

 

Related Posts

Customer Incognita

Identifying Duplicate Customers

Adventures in Data Profiling (Part 7) – Customer Name

The Quest for the Golden Copy (Part 3) – Defining “Customer”

‘Tis the Season for Data Quality

The Seven Year Glitch

DQ-IRL (Data Quality in Real Life)

Data Quality, 50023

Once Upon a Time in the Data

The Semantic Future of MDM

Wednesday
Feb022011

DQ-View: The Poor Data Quality Blizzard

Data Quality (DQ) View is an OCDQ regular segment.  Each DQ-View is a brief video discussion of a data quality key concept.

 

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: DQ-View on Vimeo

 

Related Posts

DQ-View: New Data Resolutions

DQ-View: From Data to Decision

DQ View: Achieving Data Quality Happiness

Data Quality is not a Magic Trick

DQ-View: The Cassandra Effect

DQ-View: Is Data Quality the Sun?

DQ-View: Designated Asker of Stupid Questions

Video: Oh, the Data You’ll Show!

Monday
Jan172011

Occurred, a data defect has . . .

Inspired by: The 404 Error Page of Adham Dannaway


Wednesday
Jan122011

Wordless Wednesday: January 12, 2011

Photo via an illustration from The Economist


Tuesday
Dec282010

I’m Gonna Data Profile (500 Records)

While researching my blog post (to be published on December 31) about the best data quality blog posts of the year, I re-read the great post Profound Profiling by Daragh O Brien, which recounted how he found data profiling cropping up in conversations and presentations he’d made this year, even where the topic of the day wasn’t “Information Quality” and shared his thoughts on the profound business benefits of data profiling for organizations seeking to manage risk and ensure compliance.

And I noticed that I had actually commented on this blog post . . . with song lyrics . . .

 

I’m Gonna Data Profile (500 Records) *

When I wake up, well I know I’m gonna be,
I’m gonna be the one who profiles early and often for you
When I go out, yeah I know I’m gonna be
I’m gonna be the one who goes along with data
If I get drunk, well I know I’m gonna be
I’m gonna be the one who gets drunk on managing risk for you
And if I haver up, yeah I know I’m gonna be
I’m gonna be the one who’s havering about how: “It’s the Information, Stupid!”

But I would profile 500 records
And I would profile 500 more
Just to be the one who profiles a thousand records
To deliver the profound business benefits of data profiling to your door

da da da da – ta ta ta ta
da da da da – ta ta ta ta – data!
da da da da – ta ta ta ta
da da da da – ta ta ta ta – data profiling!

When I’m working, yes I know I’m gonna be
I’m gonna be the one who’s working hard to ensure compliance for you
And when the money, comes in for the work I do
I’ll pass almost every penny on to improving data for you
When I come home (When I come home), well I know I’m gonna be
I’m gonna be the one who comes back home with data quality
And if I grow-old, (When I grow-old) well I know I’m gonna be
I’m gonna be the one who’s growing old with information quality

But I would profile 500 records
And I would profile 500 more
Just to be the one who profiles a thousand records
To deliver the profound business benefits of data profiling to your door

da da da da – ta ta ta ta
da da da da – ta ta ta ta – data!
da da da da – ta ta ta ta
da da da da – ta ta ta ta – data profiling!

When I’m lonely, well I know I’m gonna be
I’m gonna be the one who’s lonely without data profiling to do
And when I’m dreaming, well I know I’m gonna dream
I’m gonna dream about the time when I’m data profiling for you
When I go out (When I go out), well I know I’m gonna be
I’m gonna be the one who goes along with data
And when I come home (When I come home), yes I know I’m gonna be
I’m gonna be the one who comes back home with data quality
I’m gonna be the one who’s coming home with information quality

But I would profile 500 records
And I would profile 500 more
Just to be the one who profiles a thousand records
To deliver the profound business benefits of data profiling to your door

da da da da – ta ta ta ta
da da da da – ta ta ta ta – data!
da da da da – ta ta ta ta
da da da da – ta ta ta ta – data profiling!

___________________________________________________________________________________________________________________

* Based on the 1988 song I’m Gonna Be (500 Miles) by The Proclaimers.

 

Data Quality Music (DQ-Songs)

Over the Data Governance Rainbow

A Record Named Duplicate

New Time Human Business

People

You Can’t Always Get the Data You Want

A spoonful of sugar helps the number of data defects go down

Data Quality is such a Rush

I’m Bringing DQ Sexy Back

Imagining the Future of Data Quality

The Very Model of a Modern DQ General

Sunday
Dec122010

Can Data Quality avoid the Dustbin of History?

After reading two blog posts about the 2011 predictions for data management by Steve Sarsfield and Henrik Liliendahl Sørensen, I was pondering writing a 2011 prediction post of my own—and then I read this recent Dilbert comic strip.

What if Dogbert is right and the only things that matter are social networks, games, and phones?  What implications does this have for the data management industry, and more specifically, the data quality profession?  How can data quality practitioners avoid being cast into the Dustbin of History in 2011 and beyond?

Perhaps we need to create a social network for data?  Let’s call it DataTweetBook.  Although we would be allowed to follow any data with a public profile, data would have to approve our friend requests—you know, in order to respect data’s privacy.

(Quick Side Bar Question: Do you think that your organization’s data would accept your friend request—or block you?)

Next, we would partner with Zynga and create DataVille and Data Quality Wars, which would be online games exclusive to the DataTweetBook platform.  These games would include fun challenges, like “consolidate duplicates in your contact database” and “design a user interface that prevents data quality issues from happening.”  You and your data can even ask other people and data in your social network for help with completing tasks, such as “ask postal reference data to validate your mailing addresses.”

Of course, we would then need to create iPhone and Android apps for DataTweetBook, DataVille, and Data Quality Wars, so that everyone can access the new social network and games on their mobile phones.  And eventually, we would start a bidding war between Apple and Google over the exclusive rights to make an integrated mobile device, either iDataPad or DataGoogler.

So that’s my 2011 prognostication for the data quality industry—it’s going be all about social networks, games, and phones.

 

Related Posts

Dilbert, Data Quality, Rabbits, and #FollowFriday

Comic Relief: Dilbert on Project Management

Comic Relief: Dilbert to the Rescue

Thursday
Dec092010

‘Tis the Season for Data Quality

‘Tis the season for getting holiday greeting cards, and not only from family and friends, since many companies also like to mail seasons greetings to their customers, employees, and business partners.

I do appreciate the sentiment, but I mostly just check the envelopes for data quality issues with the name and/or postal address.

I have never made it through an entire holiday season without receiving at least one incorrectly addressed greeting card, and this year was no exception.  In the above image, I have highlighted that I apparently live in the town of Ankely, Pennsylvania.

I actually live in the town of Ankeny, Iowa.

The United States postal abbreviations for Pennsylvania and Iowa are PA and IA, respectively.  Additionally, the town name is only off by one character (L instead of N in the fifth position of a six character string).  Therefore, the data matching algorithms provided by most data quality tools would consider these relatively minor discrepancies to be highly probable matches.

And although Pennsylvania and Iowa are approximately 900 miles away from each other, since my street address and ZIP code (both intentionally blurred out in the image) were correct, the post office was able to successfully deliver the greeting card to me.

However, the really funny thing is that this greeting card was sent to me by a . . . (wait for it) . . . data quality tool vendor!

So apparently ‘tis the season for data quality . . . data quality issues, that is :-)

 

‘Tis the Season for Sharing Data Quality Stories

Have you encountered any seasonal data quality issues?  If so, please share your story by posting a comment below.

Monday
Oct252010

Data Quality Industry: Problem Solvers or Enablers?

This morning I had the following Twitter conversation with Andy Bitterer of Gartner Research and ANALYSTerical, sparked by my previous post about Data Quality Magic, the one and only source of which I posited comes from the people involved:

 

What Say You?

Although Andy and I were just joking around, there is some truth beneath these tweets.  After all, according to Gartner research, “the market for data quality tools was worth approximately $727 million in software-related revenue as of the end of 2009, and is forecast to experience a compound annual growth rate (CAGR) of 12% during the next five years.” 

So I thought I would open this up to a good-natured debate. 

Do you think the data quality industry (software vendors, consultants, analysts, and conferences) is working harder to solve the problem of poor data quality or perpetuate the profitability of its continued existence?

All perspectives on this debate are welcome without bias.  Therefore, please post a comment below.

(Please Note: Comments advertising your products and services (or bashing your competitors) will NOT be approved.)

 

Related Posts

Which came first, the Data Quality Tool or the Business Need?

Do you believe in Magic (Quadrants)?

Can Enterprise-Class Solutions Ever Deliver ROI?

Promoting Poor Data Quality

The Once and Future Data Quality Expert

Imagining the Future of Data Quality