Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments
Wednesday
Aug012012

Exercise Better Data Management

Recently on Twitter, Daragh O Brien and I discussed his proposed concept.  “After Big Data,” Daragh tweeted, “we will inevitably begin to see the rise of MOData as organizations seek to grab larger chunks of data and digest it.  What is MOData?  It’s MO’Data, as in MOre Data. Or Morbidly Obese Data.  Only good data quality and data governance will determine which.”

Daragh asked if MO’Data will be the Big Data Killer.  I said only if MO’Data doesn’t include MO’BusinessInsight, MO’DataQuality, and MO’DataPrivacy (i.e., more business insight, more data quality, and more data privacy).

“But MO’Data is about more than just More Data,” Daragh replied.  “It’s about avoiding Morbidly Obese Data that clogs data insight and data quality, etc.”

I responded that More Data becomes Morbidly Obese Data only if we don’t exercise better data management practices.

Agreeing with that point, Daragh replied, “Bring on MOData and the Pilates of Data Quality and Data Governance.”

To slightly paraphrase lines from one of my favorite movies — Airplane! — the Cloud is getting thicker and the Data is getting laaaaarrrrrger.  Surely I know well that growing data volumes is a serious issue — but don’t call me Shirley.

Whether you choose to measure it in terabytes, petabytes, exabytes, HoardaBytes, or how much reality bites, the truth is we were consuming way more than our recommended daily allowance of data long before the data management industry took a tip from McDonald’s and put the word “big” in front of its signature sandwich.  (Oh great . . . now I’m actually hungry for a Big Mac.)

But nowadays with silos replicating data, as well as new data, and new types of data, being created and stored on a daily basis, our data is resembling the size of Bob Parr in retirement, making it seem like not even Mr. Incredible in his prime possessed the super strength needed to manage all of our data.  Those were references to the movie The Incredibles, where Mr. Incredible was a superhero who, after retiring into civilian life under the alias of Bob Parr, elicits the observation from this superhero costume tailor: “My God, you’ve gotten fat.”  Yes, I admit not even Helen Parr (aka Elastigirl) could stretch that far for a big data joke.

 

A Healthier Approach to Big Data

Although Daragh’s concerns about morbidly obese data are valid, no superpowers (or other miracle exceptions) are needed to manage all of our data.  In fact, it’s precisely when we are so busy trying to manage all of our data that we hoard countless bytes of data without evaluating data usage, gathering data requirements, or planning for data archival.  It’s like we are trying to lose weight by eating more and exercising less, i.e., consuming more data and exercising less data quality and data governance.  As Daragh said, only good data quality and data governance will determine whether we get more data or morbidly obese data.

Losing weight requires a healthy approach to both diet and exercise.  A healthy approach to diet includes carefully choosing the food you consume and carefully controlling your portion size.  A healthy approach to exercise includes a commitment to exercise on a regular basis at a sufficient intensity level without going overboard by spending several hours a day, every day, at the gym.

Swimming is a great form of exercise, but swimming in big data without having a clear business objective before you jump into the pool is like telling your boss that you didn’t get any work done because you decided to spend all day working out at the gym.

Carefully choosing the data you consume and carefully controlling your data portion size is becoming increasingly important since big data is forcing us to revisit information overload.  However, the main reason that traditional data management practices often become overwhelmed by big data is because traditional data management practices are not always the right approach.

We need to acknowledge that some big data use cases differ considerably from traditional ones.  Data modeling is still important and data quality still matters, but how much data modeling and data quality is needed before big data can be effectively used for business purposes will vary.  In order to move the big data discussion forward, we have to stop fiercely defending our traditional perspectives about structure and quality.  We also have to stop fiercely defending our traditional perspectives about analytics, since there will be some big data use cases where depth and detailed analysis may not be necessary to provide business insight.

 

Better than Big or More

Jim Ericson explained that your data is big enough.  Rich Murnane explained that bigger isn’t better, better is better.  Although big data may indeed be followed by more data that doesn’t necessarily mean we require more data management in order to prevent more data from becoming morbidly obese data.  I think that we just need to exercise better data management.

 

Related Posts

OCDQ Radio - Saving Private Data

OCDQ Radio - The Blue Box of Information Quality

Quality is the Higgs Field of Data

Are you turning Ugly Data into Cute Information?

Big Data Lessons from Orbitz

The Graystone Effects of Big Data

Will Big Data be Blinded by Data Science?

Our Increasingly Data-Constructed World

OCDQ Radio - Data Quality and Big Data

Magic Elephants, Data Psychics, and Invisible Gorillas

HoardaBytes and the Big Data Lebowski

Big Data el Memorioso

Information Overload Revisited

The Big Data Collider

OCDQ Radio - Big Data and Big Analytics

What Magic Tricks teach us about Data Science

What Mozart for Babies teaches us about Data Science

Sometimes it’s Okay to be Shallow

Big Data: Structure and Quality

The Big Data Theory

Swimming in Big Data

Why Can’t We Predict the Weather?

Tuesday
Jul242012

Demystifying Master Data Management

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, special guest John Owens and I attempt to demystify master data management (MDM) by explaining the three types of data (Transaction, Domain, Master) and the four master data entities (Party, Product, Location, Asset), as well as, and perhaps the most important concept of all, the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

John Owens is a thought leader, consultant, mentor, and writer in the worlds of business and data modelling, data quality, and master data management (MDM).  He has built an international reputation as a highly innovative specialist in these areas and has worked in and led multi-million dollar projects in a wide range of industries around the world.

John Owens has a gift for identifying the underlying simplicity in any enterprise, even when shrouded in complexity, and bringing it to the surface.  He is the creator of the Integrated Modelling Method (IMM), which is used by business and data analysts around the world.  Later this year, John Owens will be formally launching the IMM Academy, which will provide high quality resources, training, and mentoring for business and data analysts at all levels.

You can also follow John Owens on Twitter and connect with John Owens on Linkedin.  And if you’re looking for a MDM course, consider the online course from John Owens, which you can find by clicking on this link: MDM Online Course (Affiliate Link)

 

Demystifying Master Data Management

Additional listening options:

 

Related Posts

Choosing Your First Master Data Domain

Lycanthropy, Silver Bullets, and Master Data Management

Voyage of the Golden Records

The Quest for the Golden Copy (Part 1)

The Quest for the Golden Copy (Part 2)

The Quest for the Golden Copy (Part 3)

The Quest for the Golden Copy (Part 4)

How Social can MDM get?

Will Social MDM be the New Spam?

More Thoughts about Social MDM

Is Social MDM going the Wrong Way?

The Semantic Future of MDM

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Redefining Data Quality — Guest Peter Perera discusses his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

Sunday
Jul222012

Commendable Comments (Part 13)

Welcome to the 400th Obsessive-Compulsive Data Quality (OCDQ) blog post!  I am commemorating this milestone with the 13th entry in my ongoing series for expressing gratitude to my readers for their truly commendable comments on my blog posts.

 

Commendable Comments

On Will Big Data be Blinded by Data Science?, Meta Brown commented:

“Your concern is well-founded. Knowing how few businesses make really good use of the small data they’ve had around all along, it’s easy to imagine that they won’t do any better with bigger data sets.

I wrote some hints for those wallowing into the big data mire in my post, Better than Brute Force: Big Data Analytics Tips. But the truth is that many organizations won’t take advantage of the ideas that you are presenting, or my tips, especially as the datasets grow larger. That’s partly because they have no history in scientific methods, and partly because the data science movement is driving employers to search for individuals with heroically large skill sets.

Since few, if any, people truly meet these expectations, those hired will have real human limitations, and most often they will be people who know much more about data storage and manipulation than data analysis and applications.”

On Will Big Data be Blinded by Data Science?, Mike Urbonas commented:

“The comparison between scientific inquiry and business decision making is a very interesting and important one. Successfully serving a customer and boosting competitiveness and revenue does require some (hopefully unique) insights into customer needs. Where do those insights come from?

Additionally, scientists also never stop questioning and improving upon fundamental truths, which I also interpret as not accepting conventional wisdom — obviously an important trait of business managers.

I recently read commentary that gave high praise to the manager utilizing the scientific method in his or her decision-making process. The author was not a technologist, but rather none other than Peter Drucker, in writings from decades ago.

I blogged about Drucker’s commentary, data science, the scientific method vs. business decision making, and I’d value your and others’ input: Business Managers Can Learn a Lot from Data Scientists.”

On Word of Mouth has become Word of Data, Vish Agashe commented:

“I would argue that listening to not only customers but also business partners is very important (and not only in retail but in any business). I always say that, even if as an organization you are not active in the social world, assume that your customers, suppliers, employees, competitors are active in the social world and they will talk about you (as a company), your people, products, etc.

So it is extremely important to tune in to those conversations and evaluate its impact on your business. A dear friend of mine ventured into the restaurant business a few years back. He experienced a little bit of a slowdown in his business after a great start. He started surveying his customers, brought in food critiques to evaluate if the food was a problem, but he could not figure out what was going on. I accidentally stumbled upon Yelp.com and noticed that his restaurant’s rating had dropped and there were some complaints recently about services and cleanliness (nothing major though).

This happened because he had turnover in his front desk staff. He was able to address those issues and was able to reach out to customers who had bad experience (some of them were frequent visitors). They were able to go back and comment and give newer ratings to his business. This helped him with turning the corner and helped with the situation.

This was a big learning moment for me about the power of social media and the need for monitoring it.”

On Data Quality and the Bystander Effect, Jill Wanless commented:

“Our organization is starting to develop data governance processes and one of the processes we have deliberately designed is to get to the root cause of data quality issues.

We’ve designed it so that the errors that are reported also include the userid and the system where the data was generated. Errors are then filtered by function and the business steward responsible for that function is the one who is responsible for determining and addressing the root cause (which of course may require escalation to solve).

The business steward for the functional area has the most at stake in the data and is typically the most knowledgeable as to the process or system that may be triggering the error. We have yet to test this as we are currently in the process of deploying a pilot stewardship program.

However, we are very confident that it will help us uncover many of the causes of the data quality problems and with lots of PLAN, DO, CHECK, and ACT, our goal is to continuously improve so that our need for stewardship eventually (many years away no doubt) is reduced.”

On The Return of the Dumb Terminal, Prashanta Chandramohan commented:

“I can’t even imagine what it’s like to use this iPad I own now if I am out of network for an hour. Supposedly the coolest thing to own and a breakthrough innovation of this decade as some put it, it’s nothing but a dumb terminal if I do not have 3G or Wi-Fi connectivity.

Putting most of my documents, notes, to-do’s, and bookmarked blogs for reading later (e.g., Instapaper) in the cloud, I am sure to avoid duplicating data and eliminate installing redundant applications.

(Oops! I mean the apps! :) )

With cloud-based MDM and Data Quality tools starting to linger, I can’t wait to explore and utilize the advantages these return of dumb terminals bring to our enterprise information management field.”

On Big Data Lessons from Orbitz, Dylan Jones commented:

“The fact is that companies have always done predictive marketing, they’re just getting smarter at it.

I remember living as a student in a fairly downtrodden area that because of post code analytics meant I was bombarded with letterbox mail advertising crisis loans to consolidate debts and so on. When I got my first job and moved to a new area all of a sudden I was getting loans to buy a bigger car. The companies were clearly analyzing my wealth based on post code lifestyle data.

Fast forward and companies can do way more as you say.

Teresa Cottam (Global Telecoms Analyst) has cited the big telcos as a major driver in all this, they now consider themselves data companies so will start to offer more services to vendors to track our engagement across the entire communications infrastructure (Read more here: http://bit.ly/xKkuX6).

I’ve just picked up a shiny new Mac this weekend after retiring my long suffering relationship with Windows so it will be interesting to see what ads I get served!”

And please check out all of the commendable comments received on the blog post: Data Quality and Chicken Little Syndrome.

 

Thank You for Your Comments and Your Readership

You are Awesome — which is why receiving your comments has been the most rewarding aspect of my blogging experience over the last 400 posts.  Even if you have never posted a comment, you are still awesome — feel free to tell everyone I said so.

This entry in the series highlighted commendable comments on blog posts published between April 2012 and June 2012.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please continue commenting and stay tuned for future entries in the series.

Thank you for reading the Obsessive-Compulsive Data Quality blog.  Your readership is deeply appreciated.

 

Related Posts

Commendable Comments (Part 12) – The Third Blogiversary of OCDQ Blog

Commendable Comments (Part 11)

Commendable Comments (Part 10) – The 300th OCDQ Blog Post

730 Days and 264 Blog Posts Later – The Second Blogiversary of OCDQ Blog

OCDQ Blog Bicentennial – The 200th OCDQ Blog Post

Commendable Comments (Part 9)

Commendable Comments (Part 8)

Commendable Comments (Part 7)

Commendable Comments (Part 6)

Commendable Comments (Part 5) – The 100th OCDQ Blog Post

Commendable Comments (Part 4)

Commendable Comments (Part 3)

Commendable Comments (Part 2)

Commendable Comments (Part 1)

Thursday
Jul192012

The Cloud is shifting our Center of Gravity

This blog post is sponsored by the Enterprise CIO Forum and HP.

Since more organizations are embracing cloud computing and cloud-based services, and some analysts are even predicting that personal clouds will soon replace personal computers, the cloudy future of our data has been weighing on my mind.

I recently discovered the website DataGravity.org, which contains many interesting illustrations and formulas about data gravity, a concept which Dave McCrory blogged about in his December 2010 post Data Gravity in the Clouds.

“Consider data as if it were a planet or other object with sufficient mass,” McCrory wrote.  “As data accumulates (builds mass) there is a greater likelihood that additional services and applications will be attracted to this data.  This is the same effect gravity has on objects around a planet.  As the mass or density increases, so does the strength of gravitational pull.  As things get closer to the mass, they accelerate toward the mass at an increasingly faster velocity.”

In my blog post What is Weighing Down your Data?, I explained the often misunderstood difference between mass, which is an intrinsic property of matter based on atomic composition, and weight, which is a gravitational force acting on matter.  By using these concepts metaphorically, we could say that mass is an intrinsic property of data, representing objective data quality, and weight is a gravitational force acting on data, representing subjective data quality.

I used a related analogy in my blog post Quality is the Higgs Field of Data.  By using data, we give data its quality, i.e., its mass.  We give data mass so that it can become the basic building blocks of what matters to us.

Historically, most of what we referred to as data silos were actually application silos because data and applications became tightly coupled due to the strong gravitational force that legacy applications exerted, preventing most data from achieving the escape velocity needed to free itself from an application.  But the laudable goal of storing your data in one easily accessible place, and then building services and applications around your data, is one of the fundamental value propositions of cloud computing.

With data accumulating in the cloud, as McCrory explained, although “services and applications have their own gravity, data is the most massive and dense, therefore it has the most gravity.  Data, if large enough, can be virtually impossible to move.”

The cloud is shifting our center of gravity because of the data gravitational field emitted by the massive amount of data being stored in the cloud.  The information technology universe, business world, and our personal (often egocentric) solar systems are just beginning to feel the effects of this massive gravitational shift.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

Quality is the Higgs Field of Data

What is Weighing Down your Data?

A Swift Kick in the AAS

Lightning Strikes the Cloud

The Partly Cloudy CIO

Are Cloud Providers the Bounty Hunters of IT?

The Cloud Security Paradox

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The Good, the Bad, and the Secure

The Return of the Dumb Terminal

The UX Factor

Sometimes all you Need is a Hammer

Shadow IT and the New Prometheus

The Diffusion of the Consumerization of IT

Tuesday
Jul172012

DQ-View: The Five Stages of Data Quality

Data Quality (DQ) View is an OCDQ regular segment. Each DQ-View is a brief video discussion of a data quality key concept.

In my experience, all organizations cycle through five stages while coming to terms with the daunting challenges of data quality, which are somewhat similar to The Five Stages of Grief.  So, in this short video, I explain The Five Stages of Data Quality:

  1. Denial — Our organization is well-managed and highly profitable.  We consistently meet, or exceed, our business goals.  We obviously understand the importance of high-quality data.  Data quality issues can’t possibly be happening to us.
  2. Anger — We’re now in the midst of a financial reporting scandal, and facing considerable fines in the wake of a regulatory compliance failure.  How can this be happening to us?  Why do we have data quality issues?  Who is to blame for this?
  3. Bargaining — Okay, we may have just overreacted a little bit.  We’ll purchase a data quality tool, approve a data cleansing project, implement defect prevention, and initiate data governance.  That will fix all of our data quality issues — right?
  4. Depression — Why, oh why, do we keep having data quality issues?  Why does this keep happening to us?  Maybe we should just give up, accept our doomed fate, and not bother doing anything at all about data quality and data governance.
  5. Acceptance — We can’t fight the truth anymore.  We accept that we have to do the hard daily work of continuously improving our data quality and continuously implementing our data governance principles, policies, and procedures.

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: DQ-View on Vimeo

You can also watch a regularly updated page of my videos by clicking on this link: OCDQ Videos

 

Related Posts

Posts related to the Denial Stage of Data Quality:

Data Quality and Chicken Little Syndrome

The Illusion-of-Quality Effect

Perception Filters and Data Quality

“Some is not a number and soon is not a time”

The Data Quality Wager

Posts related to the Anger Stage of Data Quality:

Jack Bauer and Enforcing Data Governance Policies

Beware the Data Governance Ides of March

Aristotle, Data Governance, and Lead Rulers

Why isn’t our data quality worse?

Don’t Do Less Bad; Do Better Good

Posts related to the Bargaining Stage of Data Quality:

Data Quality and Miracle Exceptions

Do you believe in Magic (Quadrants)?

Which came first, the Data Quality Tool or the Business Need?

The Technology Carousel

The Stakeholder’s Dilemma

Posts related to the Depression Stage of Data Quality:

There is No Such Thing as a Root Cause

The Dichotomy Paradox, Data Quality and Zero Defects

The Asymptote of Data Quality

To Our Data Perfectionists

Data Quality and the Bystander Effect

Posts related to the Acceptance Stage of Data Quality:

You only get a Return from something you actually Invest in

Data Governance Frameworks are like Jigsaw Puzzles

The HedgeFoxian Hypothesis

Finding Data Quality

Data Quality: Quo Vadimus?

Thursday
Jul122012

Quality is the Higgs Field of Data

Recently on Twitter, Daragh O Brien replied to my David Weinberger quote “The atoms of data hook together only because they share metadata,” by asking “So, is Quality Data the Higgs Boson of Information Management?”

I responded that Quality is the Higgs Boson of Data and Information since Quality gives Data and Information their Mass (i.e., their Usefulness).

“Now that is profound,” Daragh replied.

“That’s cute and all,” Brian Panulla interjected, “but you can’t measure Quality.  Mass is objective.  It’s more like Weight — a mass in context.”

I agreed with Brian’s great point since in a previous post I explained the often misunderstood difference between mass, an intrinsic property of matter based on atomic composition, and weight, a gravitational force acting on matter.

Using these concepts metaphorically, mass is an intrinsic property of data, representing objective data quality, whereas weight is a gravitational force acting on data, thereby representing subjective data quality.

But my previous post didn’t explain where matter theoretically gets its mass, and since this scientific mystery was radiating in the cosmic background of my Twitter banter with Daragh and Brian, I decided to use this post to attempt a brief explanation along the way to yet another data quality analogy.

As you have probably heard by now, big scientific news was recently reported about the discovery of the Higgs Boson, which, since the 1960s, the Standard Model of particle physics has theorized to be the fundamental particle associated with a ubiquitous quantum field (referred to as the Higgs Field) that gives all matter its mass by interacting with the particles that make up atoms and weighing them down.  This is foundational to our understanding of the universe because without something to give mass to the basic building blocks of matter, everything would behave the same way as the intrinsically mass-less photons of light behave, floating freely and not combining with other particles.  Therefore, without mass, ordinary matter, as we know it, would not exist.

 

Ping-Pong Balls and Maple Syrup

I like the Higgs Field explanation provided by Brian Cox and Jeff Forshaw.  “Imagine you are blindfolded, holding a ping-pong ball by a thread.  Jerk the string and you will conclude that something with not much mass is on the end of it.  Now suppose that instead of bobbing freely, the ping-pong ball is immersed in thick maple syrup.  This time if you jerk the thread you will encounter more resistance, and you might reasonably presume that the thing on the end of the thread is much heavier than a ping-pong ball.  It is as if the ball is heavier because it gets dragged back by the syrup.”

“Now imagine a sort of cosmic maple syrup that pervades the whole of space.  Every nook and cranny is filled with it, and it is so pervasive that we do not even know it is there.  In a sense, it provides the backdrop to everything that happens.”

Mass is therefore generated as a result of an interaction between the ping-pong balls (i.e., atomic particles) and the maple syrup (i.e, the Higgs Field).  However, although the Higgs Field is pervasive, it is also variable and selective, since some particles are affected by the Higgs Field more than others, and photons pass through it unimpeded, thereby remaining mass-less particles.

 

Quality — Data Gets Higgy with It

Now that I have vastly oversimplified the Higgs Field, let me Get Higgy with It by attempting an analogy for data quality based on the Higgs Field.  As I do, please remember the wise words of Karen Lopez: “All analogies are perfectly imperfect.”

Quality provides the backdrop to everything that happens when we use data.  Data in the wild, independent from use, is as carefree as the mass-less photon whizzing around at the speed of light, like a ping-pong ball bouncing along without a trace of maple syrup on it.  But once we interact with data using our sticky-maple-syrup-covered fingers, data begins to slow down, begins to feel the effects of our use.  We give data mass so that it can become the basic building blocks of what matters to us.

Some data is affected more by our use than others.  The more subjective our use, the more we weigh data down.  The more objective our use, the less we weigh data down.  Sometimes, we drag data down deep into the maple syrup, covering data up with an application layer, or bottling data into silos.  Other times, we keep data in the shallow end of the molasses swimming pool.

Quality is the Higgs Field of Data.  As users of data, we are the Higgs Bosons — we are the fundamental particles associated with a ubiquitous data quality field.  By using data, we give data its quality.  The quality of data can not be separated from its use any more than the particles of the universe can be separated from the Higgs Field.

The closest data equivalent of a photon, a ping-pong ball particle that doesn’t get stuck in the maple syrup of the Higgs Field, is Open Data, which doesn’t get stuck within silos, but is instead data freely shared without the sticky quality residue of our use.

 

Related Posts

Our Increasingly Data-Constructed World

What is Weighing Down your Data?

Data Myopia and Business Relativity

Redefining Data Quality

Are Applications the La Brea Tar Pits for Data?

Swimming in Big Data

Sometimes it’s Okay to be Shallow

Data Quality and Big Data

Data Quality and the Q Test

My Own Private Data

No Datum is an Island of Serendip

Sharing Data

Tuesday
Jul102012

Shining a Social Light on Data Quality

Last week, when I published my blog post Lightning Strikes the Cloud, I unintentionally demonstrated three important things about data quality.

The first thing I demonstrated was even an obsessive-compulsive data quality geek is capable of data defects, since I initially published the post with the title Lightening Strikes the Cloud, which is an excellent example of the difference between validity and accuracy caused by the Cupertino Effect, since although lightening is valid (i.e., a correctly spelled word), it isn’t contextually accurate.

The second thing I demonstrated was the value of shining a social light on data quality — the value of using collaborative tools like social media to crowd-source data quality improvements.  Thankfully, Julian Schwarzenbach quickly noticed my error on Twitter.  “Did you mean lightning?  The concept of lightening clouds could be worth exploring further,” Julian humorously tweeted.  “Might be interesting to consider what happens if the cloud gets so light that it floats away.”  To which I replied that if the cloud gets so light that it floats away, it could become Interstellar Computing or, as Julian suggested, the start of the Intergalactic Net, which I suppose is where we will eventually have to store all of that big data we keep hearing so much about these days.

The third thing I demonstrated was the potential dark side of data cleansing, since the only remaining trace of my data defect is a broken URL.  This is an example of not providing a well-documented audit trail, which is necessary within an organization to communicate data quality issues and resolutions.

Communication and collaboration are essential to finding our way with data quality.  And social media can help us by providing more immediate and expanded access to our collective knowledge, experience, and wisdom, and by shining a social light that illuminates the shadows cast upon data quality issues when a perception filter or bystander effect gets the better of our individual attention or undermines our collective best intentions — which, as I recently demonstrated, occasionally happens to all of us.

 

Related Posts

Data Quality and the Cupertino Effect

Are you turning Ugly Data into Cute Information?

The Importance of Envelopes

The Algebra of Collaboration

Finding Data Quality

The Wisdom of the Social Media Crowd

Perception Filters and Data Quality

Data Quality and the Bystander Effect

The Family Circus and Data Quality

Data Quality and the Q Test

Metadata, Data Quality, and the Stroop Test

The Three Most Important Letters in Data Governance

Thursday
Jul052012

Lightning Strikes the Cloud

This blog post is sponsored by the Enterprise CIO Forum and HP.

Recent bad storms in the United States caused power outages as well as outages of a different sort for some of the companies relying on cloud computing and cloud-based services.  As the poster child for cloud providers, Amazon Web Services always makes headlines when it suffers a major outage, as it did last Friday when its Virginia cloud computing facility was struck by lightning, an incident which John Dodge examined in his recent blog post: Has Amazon's cloud grown too big, too fast?

Another thing that commonly coincides with a cloud outage is ponderances about the nebulous definition of “the cloud.”

In his recent article for The Washington Post, How a storm revealed the myth of the ‘cloud’, Dominic Basulto pondered “of all the metaphors and analogies used to describe the Internet, perhaps none is less understood than the cloud.  A term that started nearly a decade ago to describe pay-as-you-go computing power and IT infrastructure-for-rent has crossed over to the consumer realm.  It’s now to the point where many of the Internet’s most prolific companies make it a key selling point to describe their embrace of the cloud.  The only problem, as we found out this weekend, is that there really isn’t a ‘cloud’ – there’s a bunch of rooms with servers hooked up with wires and tubes.”

One of the biggest benefits of cloud computing, especially for many small businesses and start-up companies, is that it provides an organization with the ability to focus on its core competencies, allowing non-IT companies to be more business-focused.

As Basulto explained, “instead of having to devote resources and time to figuring out the computing back-end, young Internet companies like Instagram and Pinterest could concentrate on hiring the right people and developing business models worth billions.  Hooking up to the Internet became as easy as plugging into the local electricity provider, even as users uploaded millions of photos or streamed millions of videos at a time.”

But these benefits are not just for Internet companies.  In his book The Big Switch: Rewiring the World, from Edison to Google, Nicholas Carr used the history of electric grid power utilities as a backdrop and analogy for examining the potential benefits that all organizations can gain from adopting Internet-based utility (i.e., cloud) computing.

The benefits of a utility however, whether it’s electricity or cloud computing, can only be realized if the utility operates reliably.

“A temporary glitch while watching a Netflix movie is annoying,” Basulto noted, but “imagine what happens when there’s a cloud outage that affects airports, hospitals, or yes, the real-world utility grid.”  And so, whenever any utility suffers an outage, it draws attention to something we’ve become dependent on — but, in fairness, it’s also something we take for granted when it’s working.

“Maybe the late Alaska Senator Ted Stevens was right,” Basulto concluded, “maybe the Internet really is a series of tubes rather than a cloud.  If so, the company with the best plumbing wins.”  A few years ago, I published a satirical post about the cloud, which facetiously recommended that instead of beaming your data up into the cloud, bury your data down underground.

However, if plumbing, not electricity, is the better metaphor for cloud computing infrastructure, then perhaps cloud providers should start striking ground on subterranean data centers built deep enough to prevent lightning from striking the cloud again.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

The Partly Cloudy CIO

Are Cloud Providers the Bounty Hunters of IT?

The Cloud Security Paradox

The Good, the Bad, and the Secure

The Return of the Dumb Terminal

A Swift Kick in the AAS

The UX Factor

Sometimes all you Need is a Hammer

Shadow IT and the New Prometheus

The Diffusion of the Consumerization of IT

Monday
Jul022012

Saving Private Data

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

This episode is an edited rebroadcast of a segment from the OCDQ Radio 2011 Year in Review, during which Daragh O Brien and I discuss the data privacy and data protection implications of social media, cloud computing, and big data.

Daragh O Brien is one of Ireland’s leading Information Quality and Governance practitioners.  After being born at a young age, Daragh has amassed a wealth of experience in quality information driven business change, from CRM Single View of Customer to Regulatory Compliance, to Governance and the taming of information assets to benefit the bottom line, manage risk, and ensure customer satisfaction.  Daragh O Brien is the Managing Director of Castlebridge Associates, one of Ireland’s leading consulting and training companies in the information quality and information governance space.

Daragh O Brien is a founding member and former Director of Publicity for the IAIDQ, which he is still actively involved with.  He was a member of the team that helped develop the Information Quality Certified Professional (IQCP) certification and he recently became the first person in Ireland to achieve this prestigious certification.

In 2008, Daragh O Brien was awarded a Fellowship of the Irish Computer Society for his work in developing and promoting standards of professionalism in Information Management and Governance.

Daragh O Brien is a regular conference presenter, trainer, blogger, and author with two industry reports published by Ark Group, the most recent of which is The Data Strategy and Governance Toolkit.

You can also follow Daragh O Brien on Twitter and connect with Daragh O Brien on LinkedIn.

 

Saving Private Data

Additional listening options:

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • Social Media Strategy — Guest Crysta Anderson of IBM Initiate explains social media strategy and content marketing, including three recommended practices: (1) Listen intently, (2) Communicate succinctly, and (3) Have fun.
  • The Fall Back Recap Show — A look back at the Best of OCDQ Radio, including discussions about Data, Information, Business-IT Collaboration, Change Management, Big Analytics, Data Governance, and the Data Revolution.

Thursday
Jun282012

Big Data Lessons from Orbitz

One of the week’s interesting technology stories was On Orbitz, Mac Users Steered to Pricier Hotels, an article by Dana Mattioli in The Wall Street Journal, about how online travel company Orbitz used data mining to discover significant spending differences between their Mac and PC customers (who were identified by the operating system of the computer used to book reservations).

Orbitz discovered that Mac users are 40% more likely to book a four- or five-star hotel, and tend to stay in more expensive rooms, spending on average $20 to $30 more a night on hotels.  Based on this discovery, Orbitz has been experimenting with showing different hotel offers to Mac and PC visitors, ranking the more expensive hotels on the first page of search results for Mac users.

This Orbitz story is interesting because I think it provides two important lessons about big data for businesses of all sizes.

The first lesson is, as Mattioli reported, “the sort of targeting undertaken by Orbitz is likely to become more commonplace as online retailers scramble to identify new ways in which people’s browsing data can be used to boost online sales.  Orbitz lost $37 million in 2011 and its stock has fallen by more than 74% since its 2007 IPO.  The effort underscores how retailers are becoming bigger users of so-called predictive analytics, crunching reams of data to guess the future shopping habits of customers.  The goal is to tailor offerings to people believed to have the highest lifetime value to the retailer.”

The second lesson is a good example of how word of mouth has become word of data.  Shortly after the article was published, Orbitz became a trending topic on Twitter — but not in a way that the company would have hoped.  A lot of negative sentiment was expressed by Mac users claiming that they would no longer use Orbitz since they charged Mac users more than PC users.

However, this commonly expressed misunderstanding was clarified by an Orbitz spokesperson in the article, who explained that Orbitz is not charging Mac users more money for the same hotels, but instead they are simply setting the default search rank to show Mac users the more expensive hotels first.  Mac users can always re-sort the results ascending by price in order to see the same less expensive hotels that would be displayed in the default search rank used for PC users.  Orbitz is attempting to offer a customized (albeit a generalized, not personalized) user experience, but some users see it as gaming the system against them.

This Orbitz story provides two lessons about the brave new business world brought to us by big data and data science, where more companies are using predictive analytics to discover business insights, and more customers are empowering themselves with data.

Business has always resembled a battlefield.  But nowadays, data is the weapon of choice for companies and customers alike, since, in our increasing data-constructed world, big data is no longer just for big companies, and everyone is a data geek now.

 

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.

 

Monday
Jun252012

Metadata, Data Quality, and the Stroop Test

In psychology, the Stroop Effect is a demonstration of the reaction time of a task.  The most commonly used example is what is known as the Stroop Test, which compares the time needed to name colors when they are printed in an ink color that matches their name (e.g., greenyellowredbluebrownpurple) with the time needed to name the same colors when they are printed in an ink color that does not match their name (e.g., bluered, purple, green, brownyellow).  Naming the color of the word takes longer, and is more prone to errors, when the ink color does not match the name of the color.

The Stroop Test, where colors do not match their names, reminds me of the relationship between metadata and data quality if I view the ink color as the metadata and the name of the color as the data, given that understanding data takes longer, and is more prone to errors, when the metadata does not match the data, or when the metadata is ambiguous.

Unlike the Stroop Test, where poor metadata (ink color) obfuscates good data (name of the color), data quality issues can also be caused when good metadata is undermined by poor data (e.g., data entry errors like an email address being entered into a postal address field).  And, of course, even when the entered data matches the metadata (or automatic data-to-metadata matching is enabled by drop-down boxes), more insidious data quality issues can be caused by the complex challenge of data accuracy.

Additionally, the point of view paradox can turn data quality debates about fitness for the purpose of use even more colorful than the Stroop Test, such as when data that one user sees as red and green, another user sees as crimson and chartreuse.

But hopefully we can all agree that good data quality begins with good metadata, because better metadata makes data better.

 

Related Posts

You Say Potato and I Say Tater Tot

The Metadata Continuum

The Metadata Crisis

Let’s Meta a Data

What’s the Meta with your Data?

DQ-View: MetaData makes BettahMusic

Who Framed Data Entry?

Data Quality and the Cupertino Effect

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

DQ-BE: Data Quality Airlines

Data Quality and the Q Test

Thursday
Jun212012

The Return of the Dumb Terminal

This blog post is sponsored by the Enterprise CIO Forum and HP.

In his book What Technology Wants, Kevin Kelly observed “computers are becoming ever more general-purpose machines as they swallow more and more functions.  Entire occupations and their workers’ tools have been subsumed by the contraptions of computation and networks.  You can no longer tell what a person does by looking at their workplace, because 90 percent of employees are using the same tool — a personal computer.  Is that the desk of the CEO, the accountant, the designer, or the receptionist?  This is amplified by cloud computing, where the actual work is done on the net as a whole and the tool at hand merely becomes a portal to the work.  All portals have become the simplest possible window — a flat screen of some size.”

Although I am an advocate for cloud computing and cloud-based services, sometimes I can’t help but wonder if cloud computing is turning our personal computers back into that simplest of all possible windows that we called the dumb terminal.

Twenty years ago, at the beginning of my IT career, when I was a mainframe production support specialist, my employer gave me a dumb terminal to take home for connecting to the mainframe via my dial-up modem.  Since I used it late at night when dealing with nightly production issues, the aptly nicknamed green machine (its entirely text-based display used bright green characters) would make my small apartment eerily glow green, which convinced my roommate and my neighbors that I was some kind of mad scientist performing unsanctioned midnight experiments with radioactive materials.

The dumb terminal was so-called because, when not connected to the mainframe, it was essentially a giant paperweight since it provided no offline functionality.  Nowadays, our terminals (smartphones, tablets, and laptops) are smarter, but in some sense, with more functionality moving to the cloud, even though they provide varying degrees of offline functionality, our terminals get dumbed back down when they’re not connected to the web or a mobile network, because most of what we really need is online.

It can even be argued that smartphones and tablets were actually designed to be dumb terminals because they intentionally offer limited offline data storage and computing power, and are mostly based on a mobile-app-portal-to-the-cloud computing model, which is well-supported by the widespread availability of high-speed network connectivity options (broadband, mobile, Wi-Fi).

Laptops (and the dwindling number of desktops) are the last bastions of offline data storage and computing power.  Moving more of those applications and data to the cloud would help eliminate redundant applications and duplicated data, and make it easier to use the right technology for a specific business problem.  And if most of our personal computers were dumb terminals, then our smart people could concentrate more on the user experience aspects of business-enabling information technology.

Perhaps the return of the dumb terminal is a smart idea after all.

This blog post is sponsored by the Enterprise CIO Forum and HP.

 

Related Posts

A Swift Kick in the AAS

The UX Factor

The Partly Cloudy CIO

Are Cloud Providers the Bounty Hunters of IT?

The Cloud Security Paradox

Sometimes all you Need is a Hammer

Why does the sun never set on legacy applications?

Are Applications the La Brea Tar Pits for Data?

The Diffusion of the Consumerization of IT

More Tethered by the Untethered Enterprise?

Sunday
Jun172012

The Family Circus and Data Quality

Like many young intellectuals, the only part of the Sunday newspaper I read growing up was the color comics section, and one of my favorite comic strips was The Family Circus created by cartoonist Bil Keane.  One of the recurring themes of the comic strip was a set of invisible gremlins that the children used to shift blame for any misdeeds, including Ida Know, Not Me, and Nobody.

Although I no longer read any section of the newspaper on any day of the week, this Sunday morning I have been contemplating how this same set of invisible gremlins is also used by many people throughout most organizations to shift blame for any incidents when poor data quality negatively impacted business activities, especially since, when investigating the root cause, you often find that Ida Know owns the dataNot Me is accountable for data governance, and Nobody takes responsibility for data quality.

 

Related Posts

The Third Law of Data Quality

The Data Governance Oratorio

Data Quality and the Bystander Effect

Shared Responsibility

The Algebra of Collaboration

Collaboration isn’t Brain Surgery

The Three Most Important Letters in Data Governance

Video: Declaration of Data Governance

The Collaborative Culture of Data Governance

The Role of Data Quality Monitoring in Data Governance

Thursday
Jun142012

The Graystone Effects of Big Data

As a big data geek and a big fan of science fiction, I was intrigued by Zoe Graystone, the central character of the science fiction television show Caprica, which was a spin-off prequel of the re-imagined Battlestar Galactica television show.

Zoe Graystone was a teenage computer programming genius who created a virtual reality avatar of herself based on all of the available data about her own life, leveraging roughly 100 terabytes of personal data from numerous databases.  This allowed her avatar to access data from her medical files, DNA profiles, genetic typing, CAT scans, synaptic records, psychological evaluations, school records, emails, text messages, phone calls, audio and video recordings, security camera footage, talent shows, sports, restaurant bills, shopping receipts, online search history, music lists, movie tickets, and television shows.  The avatar transformed that big data into personality and memory, and believably mimicked the real Zoe Graystone within a virtual reality environment.

The best science fiction reveals just how thin the line is that separates imagination from reality.  Over thirty years ago, around the time of the original Battlestar Galactica television show, virtual reality avatars based on massive amounts of personal data would likely have been dismissed as pure fantasy.  But nowadays, during the era of big data and data science, the idea of Zoe Graystone creating a virtual reality avatar of herself doesn’t sound so far-fetched, nor is it pure data science fiction.

“On Facebook,” Ellis Hamburger recently blogged, “you’re the sum of all your interactions and photos with others.  Foursquare began its life as a way to see what your friends are up to, but it has quickly evolved into a life-logging tool / artificial intelligence that knows you like an old friend does.”

Facebook and Foursquare are just two social media examples of our increasingly data-constructed world, which is creating a virtual reality environment where our data has become our avatar and our digital mouths are speaking volumes about us.

Big data and real data science are enabling people and businesses of all sizes to put this virtual reality environment to good use, such as customers empowering themselves with data and companies using predictive analytics to discover business insights.

I refer to the positive aspects of Big Data as the Zoe Graystone Effect.

But there are also negative aspects to the virtual reality created by our big data avatars.  For example, in his recent blog post Rethinking Privacy in an Era of Big Data, Quentin Hardy explained “by triangulating different sets of data (you are suddenly asking lots of people on LinkedIn for endorsements on you as a worker, and on Foursquare you seem to be checking in at midday near a competitor’s location), people can now conclude things about you (you’re probably interviewing for a job there).”

On the Caprica television show, Daniel Graystone (her father) used Zoe’s avatar as the basis for an operating system for a race of sentient machines known as Cylons, which ultimately lead to the Cylon Wars and the destruction of most of humanity.  A far less dramatic example from the real world, which I explained in my blog post The Data Cold War, is how companies like Google use the virtual reality created by our big data avatars against us by selling our personal data (albeit indirectly) to advertisers.

I refer to the negative aspects of Big Data as the Daniel Graystone Effect.

How have your personal life and your business activities been affected by the Graystone Effects of Big Data?

 

This post was written as part of the IBM for Midsize Business program, which provides midsize businesses with the tools, expertise and solutions they need to become engines of a smarter planet.

 

Monday
Jun112012

Data Quality and Chicken Little Syndrome

“The sky is falling!” exclaimed Chicken Little after an acorn fell on his head, causing him to undertake a journey to tell the King that the world is coming to an end.  So says the folk tale that became an allegory for people accused of being unreasonably afraid, or people trying to incite an unreasonable fear in those around them, sometimes referred to as Chicken Little Syndrome.

The sales pitches for data quality solutions often suffer from Chicken Little Syndrome, when vendors and consultants, instead of trying to sell the business benefits of data quality, focus too much on the negative aspects of not investing in data quality, and try scaring people into prioritizing data quality initiatives by exclaiming “your company is failing because your data quality is bad!”

The Chicken Littles of Data Quality use sound bites like “data quality problems cost businesses more than $600 billion a year!” or “poor data quality costs organizations 35% of their revenue!”  However, the most common characteristic of these fear mongering estimates about the costs of poor data quality is that, upon closer examination, most of them either rely on anecdotal evidence, or hide behind the curtain of an allegedly proprietary case study, the details of which conveniently can’t be publicly disclosed.

Lacking a tangible estimate for the cost of poor data quality often complicates building the business case for data quality.  Even though a data quality initiative has the long-term potential of reducing the costs, and mitigating the risks, associated with poor data quality, its initial costs are very tangible.  For example, the short-term increased costs of a data quality initiative can include the purchase of data quality software, and the professional services needed for training and consulting to support installation, configuration, application development, testing, and production implementation.  When considering these short-term costs, and especially when lacking a tangible estimate for the cost of poor data quality, many organizations understandably conclude that it’s less risky to gamble on not investing in a data quality initiative and hope things are just not as bad as Chicken Little claims.

 

“The sky isn’t falling on us.”

Furthermore, the reason that citing specific examples of poor data quality (e.g., IQTrainwrecks.com) also doesn’t work very well is not just because of the lack of a verifiable estimate for the associated business costs.  Another significant contributing factor is that people naturally dismiss the possibility that something bad that happened to someone else could also happen to them.

So, when Chicken Little undertakes a journey to tell the CEO that the organization is coming to an end due to poor data quality, exclaiming that “the sky is falling!” while citing one of those data quality disaster stories that befell another organization, should we really be surprised when the CEO looks up, scratches their head, and declares that “the sky isn’t falling on us.”

Sometimes, denying the existence of data quality issues is a natural self-defense mechanism for the people responsible for the business processes and technology surrounding data since nobody wants to be blamed for causing, or failing to fix, data quality issues.  Other times, people suffer from the illusion-of-quality effect caused by the dark side of data cleansing.  In other words, they don’t believe that data quality issues occur very often because the data made available to end users in dashboards and reports often passes through many processes that cleanse or otherwise sanitize the data before it reaches them.

 

Can we stop Playing Chicken with Data Quality?

Most of the time, advocating for data quality feels like we are playing chicken with executive sponsors and business stakeholders, as if we were driving toward them at full speed on a collision course, armed with fear mongering and disaster stories, hoping that they swerve in the direction of approving a data quality initiative.  But there has to be a better way to advocate for data quality other than constantly exclaiming that “the sky is falling!”  (Don’t cry fowl — I realize that I just mixed my chicken metaphors.)

I welcome your suggestions (chicken-metaphor-based or otherwise) by inviting you to post a comment below.

 

Related Posts

Selling the Business Benefits of Data Quality

DQ-View: The Cassandra Effect

In Search of an Anecdotal Antidote

The Data Quality Wager

“Some is not a number and soon is not a time”

The Illusion-of-Quality Effect

Are you turning Ugly Data into Cute Information?

Perception Filters and Data Quality

The Data Quality of Dorian Gray

Data Quality: Quo Vadimus?

Data Quality and the Bystander Effect

Data Quality and the Q Test

 

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Organizing for Data Quality — Guest Tom Redman (aka the “Data Doc”) discusses how your organization should approach data quality, including his call to action for your role in the data revolution.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Redefining Data Quality — Guest Peter Perera discusses his proposed redefinition of data quality, as well as his perspective on the relationship of data quality to master data management and data governance.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.