Commendable Comments (Part 8)

This Thursday is Thanksgiving Day, which is a United States holiday with a long and varied history.  The most consistent themes remain family and friends gathering together to share a large meal and express their gratitude.

This is the eighth entry in my ongoing series for expressing my gratitude to my readers for their truly commendable comments on my blog posts.  Receiving comments is the most rewarding aspect of my blogging experience.  Although I am truly grateful to all of my readers, I am most grateful to my commenting readers.

 

Commendable Comments

On The Data-Decision Symphony, James Standen commented:

“Being a lover of both music and data, it struck all the right notes!

I think the analogy is a very good one—when I think about data as music, I think about a companies business intelligence architecture as being a bit like a very good concert hall, stage, and instruments. All very lovely to listen to music—but without the score itself (the data), there is nothing to play.

And while certainly a real live concert hall is fantastic for enjoying Bach, I’m enjoying some Bach right now on my laptop—and the MUSIC is really the key.

Companies very often focus on building fantastic concert halls (made with all the best and biggest data warehouse appliances, ETL servers, web servers, visualization tools, portals, etc.) but forget that the point was to make that decision—and base it on data from the real world. Focusing on the quality of your data, and on the decision at hand, can often let you make wonderful music—and if your budget or schedule doesn't allow for a concert hall, you might be able to get there regardless.”

On “Some is not a number and soon is not a time”, Dylan Jones commented:

“I used to get incredibly frustrated with the data denial aspect of our profession.  Having delivered countless data quality assessments, I’ve never found an organization that did not have pockets of extremely poor data quality, but as you say, at the outset, no-one wants to believe this.

Like you, I’ve seen the natural defense mechanisms.  Some managers do fear the fallout and I’ve even had quite senior directors bury our research and quickly cut any further activity when issues have been discovered, fortunately that was an isolated case.

In the majority of cases though I think that many senior figures are genuinely shocked when they see their data quality assessments for the first time.  I think the big problem is that because they institutionalize so many scrap and rework processes and people that are common to every organization, the majority of issues are actually hidden.

This is one of the issues I have with the big shock announcements we often see in conference presentations (I’m as guilty as hell for these so call me a hypocrite) where one single error wipes millions off a share price or sends a space craft hurtling into Mars. 

Most managers don’t experience this cataclysm, so it’s hard for them to relate to because it implies their data needs to be perfect, they believe that’s unattainable and lose interest.

Far better to use anecdotes like the one cited in this blog to demonstrate how simple improvements can change lives and the bottom line in a limited time span.”

On The Real Data Value is Business Insight, Winston Chen commented:

“Yes, quality is in the eye of the beholder.  Data quality metrics must be calculated within the context of a data consumer.  This context is missing in most software tools on the market.

Another important metric is what I call the Materiality Metric.

In your example, 50% of customer data is inaccurate.  It’d be helpful if we know which 50%.  Are they the customers that generate the most revenue and profits, or are they dormant customers?  Are they test records that were never purged from the system?  We can calculate the materiality metric by aggregating a relevant business metric for those bad records.

For example, 85% of the year-to-date revenue is associated with those 50% bad customer records.

Now we know this is serious!”

On The Real Data Value is Business Insight, James Taylor commented:

“I am constantly amazed at the number of folks I meet who are paralyzed about advanced analytics, saying that ‘we have to fix/clean/integrate all our data before we can do that.’

They don’t know if the data would even be relevant, haven’t considered getting the data from an external source and haven't checked to see if the analytic techniques being considered could handle the bad or incomplete data automatically!  Lots of techniques used in data mining were invented when data was hard to come by and very ‘dirty’ so they are actually pretty good at coping.  Unless someone thinks about the decision you want to improve, and the analytics they will need to do so, I don’t see how they can say their data is too dirty, too inconsistent to be used.”

On The Business versus IT—Tear down this wall!, Scott Andrews commented:

“Early in my career, I answered a typical job interview question ‘What are your strengths?’ with:

‘I can bring Business and IT together to deliver results.’

My interviewer wryly poo-poo’d my answer with ‘Business and IT work together well already,’ insinuating that such barriers may have existed in the past, but were now long gone.  I didn’t get that particular job, but in the years since I have seen this barrier in action (I can attest that my interviewer was wrong).

What is required for Business Intelligence success is to have smart business people and smart IT people working together collaboratively.  Too many times one side or the other says ‘that’s not my job’ and enormous potential is left unrealized.”

On The Business versus IT—Tear down this wall!, Jill Wanless commented:

“It amazes me (ok, not really...it makes me cynical and want to rant...) how often Business and IT SAY they are collaborating, but it’s obvious they have varying views and perspectives on what collaboration is and what the expected outcomes should be.  Business may think collaboration means working together for a solution, IT may think it means IT does the dirty work so Business doesn’t have to.

Either way, why don’t they just start the whole process by having a (honest and open) chat about expectations and that INCLUDES what collaboration means and how they will work together.

And hopefully, (here’s where I start to rant because OMG it’s Collaboration 101) that includes agreement not to use language such as BUSINESS and IT, but rather start to use language like WE.”

On Delivering Data Happiness, Teresa Cottam commented:

“Just a couple of days ago I had this conversation about the curse of IT in general:

When it works no-one notices or gives credit; it’s only when it’s broken we hear about it.

A typical example is government IT over here in the UK.  Some projects have worked well; others have been spectacular failures.  Guess which we hear about?  We review failure mercilessly but sometimes forget to do the same with success so we can document and repeat the good stuff too!

I find the best case studies are the balanced ones that say: this is what we wanted to do, this is how we did it, these are the benefits.  Plus this is what I’d do differently next time (lessons learned).

Maybe in those lessons learned we should also make a big effort to document the positive learnings and not just take these for granted.  Yes these do come out in ‘best practices’ but again, best practices never get the profile of disaster stories...

I wonder if much of the gloom is self-fulfilling almost, and therefore quite unhealthy.  So we say it’s difficult, the failure rate is high, etc. – commonly known as covering your butt.  Then when something goes wrong you can point back to the low expectations you created in the first place.

But maybe, the fact we have low expectations means we don’t go in with the right attitude?

The self-defeating outcome is that many large organizations are fearful of getting to grips with their data problems.  So lots of projects we should be doing to improve things are put on hold because of the perceived risk, disruption, cost – things then just get worse making the problem harder to resolve.

Data quality professionals surely dont want to be seen as effectively undertakers to the doomed project, necessary yes, but not surrounded by the unmistakable smell of death that makes others uncomfortable.

Sure the nature of your work is often to focus on the broken, but quite apart from anything else, isn’t it always better to be cheerful?”

On Why isn’t our data quality worse?, Gordon Hamilton commented:

“They say that sport coaches never teach the negative, or to double the double negative, they never say ‘don’t do that.’  I read somewhere, maybe Daniel Siegel’s stuff, that when the human brain processes the statement ‘don’t do that’ it drops the ‘don’t,’ which leaves it thinking ‘do that.’

Data quality is a complex and multi-splendiforous area with many variables intermingled, but our task as Data Quality Evangelists would be more pleasant if we were helping people rise to the level of the positive expectations, rather than our being codependent in their sinking to the level of the negative expectation.”

DQ-Tip: “There is no such thing as data accuracy...” sparked an excellent debate between Graham Rhind and Peter Benson, who is the Project Leader of ISO 8000, which is the international standards for data quality.  Their debate included the differences and interdependencies that exist between data and information, as well as between data quality and information quality.

 

Thanks for giving your comments

Thank you very much for giving your comments and sharing your perspectives with our collablogaunity.

This entry in the series highlighted commendable comments on OCDQ Blog posts published in August and September of 2010.

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please keep on commenting and stay tuned for future entries in the series.

 

Related Posts

Commendable Comments (Part 7)

Commendable Comments (Part 6)

Commendable Comments (Part 5)

Commendable Comments (Part 4)

Commendable Comments (Part 3)

Commendable Comments (Part 2)

Commendable Comments (Part 1)

The Data Outhouse

This is a screen capture of the results of last week’s unscientific data quality poll where it was noted that in many organizations a data warehouse is the only system where data from numerous and disparate operational sources has been integrated into a single system of record containing fully integrated and historical data.  Although the rallying cry and promise of the data warehouse has long been that it will serve as the source for most of the enterprise’s reporting and decision support needs, many simply get ignored by the organization, which continues to rely on its data silos and spreadsheets for reporting and decision making.

Based on my personal experience, the most common reason is that these big boxes of data are often built with little focus on the quality of the data being delivered.  However, since that’s just my opinion, I launched the poll and invited your comments.

 

Commendable Comments

Stephen Putman commented that data warehousing “projects are usually so large that if you approach them in a big-bang, OLTP management fashion, the foundational requirements of the thing change between inception and delivery.”

“I’ve seen very few data warehouses live up to the dream,” Dylan Jones commented.  “I’ve always found that silos still persisted after a warehouse introduction because the turnaround on adding new dimensions and reports to the warehouse/mart meant that the business users simply had no option.  I think data quality obviously plays a part.  The business side only need to be burnt once or twice before they lose faith.  That said, a data warehouse is one of the best enablers of data quality motivation, so without them a lot of projects simply wouldn’t get off the ground.”

“I just voted Outhouse too,” commented Paul Drenth, “because I agree with Dylan that the business side keeps using other systems out of disappointment in the trustworthiness of the data warehouse.  I agree that bad data quality plays a role in that, but more often it’s also a lack of discipline in the organization which causes a downward spiral of missing information, and thus deciding to keep other information in a separate or local system.  So I think usability of data warehouse systems still needs to be improved significantly, also by adding invisible or automatic data quality assurance, the business might gain more trust.”

“Great point Paul, useful addition,” Dylan responded.  “I think discipline is a really important aspect, this ties in with change management.  A lot of business people simply don’t see the sense of urgency for moving their reports to a warehouse so lack the discipline to follow the procedures.  Or we make the procedures too inflexible.  On one site I noticed that whenever the business wanted to add a new dimension or category it would take a 2-3 week turnaround to sign off.  For a financial services company this was a killer because they had simply been used to dragging another column into their Excel spreadsheets, instantly getting the data they needed.  If we’re getting into information quality for a second, then the dimension of presentation quality and accessibility become far more important than things like accuracy and completeness.  Sure a warehouse may be able to show you data going back 15 years and cross validates results with surrogate sources to confirm accuracy, but if the business can’t get it in a format they need, then it’s all irrelevant.”

“I voted Data Warehouse,” commented Jarrett Goldfedder, “but this is marked with an asterisk.  I would say that 99% of the time, a data warehouse becomes an outhouse, crammed with data that serves no purpose.  I think terminology is important here, though.  In my previous organization, we called the Data Warehouse the graveyard and the people who did the analytics were the morticians.  And actually, that’s not too much of a stretch considering our job was to do CSI-type investigations and autopsies on records that didn’t fit with the upstream information.  This did not happen often, but when it did, we were quite grateful for having historical records maintained.  IMHO, if the records can trace back to the existing data and will save the organization money in the long-run, then the warehouse has served its purpose.”

“I’m having a difficult time deciding,” Corinna Martinez commented, “since most of the ones I have seen are high quality data, but not enough of it and therefore are considered Data Outhouses.  You may want to include some variation in your survey that covers good data but not enough; and bad data but lots to shift through in order to find something.”

“I too have voted Outhouse,” Simon Daniels commented, “and have also seen beautifully designed, PhD-worthy data warehouse implementations that are fundamentally of no practical use.  Part of the reason for this I think, particularly from a marketing point-of-view, which is my angle, is that how the data will be used is not sufficiently thought through.  In seeking to create marketing selections, segmentation and analytics, how will the insight locked-up in the warehouse be accessed within the context of campaign execution and subsequent response analysis?  Often sitting in splendid isolation, the data warehouse doesn’t offer the accessibility needed in day-to-day activities.”

Thanks to everyone who voted and special thanks to everyone who commented.  As always, your feedback is greatly appreciated.

 

Can MDM and Data Governance save the Data Warehouse?

During last week’s Informatica MDM Tweet Jam, Dan Power explained that master data management (MDM) can deliver to the business “a golden copy of the data that they can trust” and I remarked how companies expected that from their data warehouse.

“Most companies had unrealistic expectations from data warehouses,” Power responded, “which ended up being expensive, read-only, and updated infrequently.  MDM gives them the capability to modify the data, publish to a data warehouse, and manage complex hierarchies.  I think MDM offers more flexibility than the typical data warehouse.  That’s why business intelligence (BI) on top of MDM (or more likely, BI on top of a data warehouse that draws data from MDM) is so popular.”

As a follow-up question, I asked if MDM should be viewed as a complement or a replacement for the data warehouse.  “Definitely a complement,” Power responded. “MDM fills a void in the middle between transactional systems and the data warehouse, and does things that neither can do to data.”

In his recent blog post How to Keep the Enterprise Data Warehouse Relevant, Winston Chen explains that the data quality deficiencies of most data warehouses could be aided by MDM and data governance, which “can define and enforce data policies for quality across the data landscape.”  Chen believes that the data warehouse “is in a great position to be the poster child for data governance, and in doing so, it can keep its status as the center of gravity for all things data in an enterprise.”

I agree with Power that MDM can complement the data warehouse, and I agree with Chen that data governance can make the data warehouse (as well as many other things) better.  So perhaps MDM and data governance can save the data warehouse.

However, I must admit that I remain somewhat skeptical.  The same challenges that have caused most data warehouses to become data outhouses are also fundamental threats to the success of MDM and data governance.

 

Thinking outside the house

Just like real outhouses were eventually obsolesced by indoor plumbing, I wonder if data outhouses will eventually be obsolesced, perhaps ironically by emerging trends of outdoor plumbing, i.e., open source, cloud computing, and software as a service (SaaS).

Many industry analysts are also advocating the evolution of data as a service (DaaS), where data is taken out of all of its houses, meaning that the answer to my poll question might be neither data warehouse nor data outhouse.

Although none of these trends obviate the need for data quality nor alleviate the other significant challenges mentioned above, perhaps when it comes to data, we need to start thinking outside the house.

 

Related Posts

DQ-Poll: Data Warehouse or Data Outhouse?

Podcast: Data Governance is Mission Possible

Once Upon a Time in the Data

The Idea of Order in Data

Fantasy League Data Quality

Which came first, the Data Quality Tool or the Business Need?

Finding Data Quality

The Circle of Quality

TDWI World Conference Orlando 2010

Last week I attended the TDWI World Conference held November 7-12 in Orlando, Florida at the Loews Royal Pacific Resort.

As always, TDWI conferences offer a variety of full-day and half-day courses taught in an objective, vendor-neutral manner, designed for professionals and taught by in-the-trenches practitioners who are well known in the industry.

In this blog post, I summarize a few key points from two of the courses I attended.  I used Twitter to help me collect my notes, and you can access the complete archive of my conference tweets on Twapper Keeper.

 

A Practical Guide to Analytics

Wayne Eckerson, author of the book Performance Dashboards: Measuring, Monitoring, and Managing Your Business, described the four waves of business intelligence:

  1. Reporting – What happened?
  2. Analysis – Why did it happen?
  3. Monitoring – What’s happening?
  4. Prediction – What will happen?

“Reporting is the jumping off point for analytics,” explained Eckerson, “but many executives don’t realize this.  The most powerful aspect of analytics is testing our assumptions.”  He went on to differentiate the two strains of analytics:

  1. Exploration and Analysis – Top-down and deductive, primarily uses query tools
  2. Prediction and Optimization – Bottom-up and inductive, primarily uses data mining tools

“A huge issue for predictive analytics is getting people to trust the predictions,” remarked Eckerson.  “Technology is the easy part, the hard part is selling the business benefits and overcoming cultural resistance within the organization.”

“The key is not getting the right answers, but asking the right questions,” he explained, quoting Ken Rudin of Zynga.

“Deriving insight from its unique information will always be a competitive advantage for every organization.”  He recommended the book Competing on Analytics: The New Science of Winning as a great resource for selling the business benefits of analytics.

 

Data Governance for BI Professionals

Jill Dyché, a partner and co-founder of Baseline Consulting, explained that data governance transcends business intelligence and other enterprise information initiatives such as data warehousing, master data management, and data quality.

“Data governance is the organizing framework,” explained Dyché, “for establishing strategy, objectives, and policies for corporate data.  Data governance is the business-driven policy making and oversight of corporate information.”

“Data governance is necessary,” remarked Dyché, “whenever multiple business units are sharing common, reusable data.”

“Data governance aligns data quality with business measures and acceptance, positions enterprise data issues as cross-functional, and ensures data is managed separately from its applications, thereby evolving data as a service (DaaS).”

In her excellent 2007 article Serving the Greater Good: Why Data Hoarding Impedes Corporate Growth, Dyché explained the need for “systemizing the notion that data – corporate asset that it is – belongs to everyone.”

“Data governance provides the decision rights around the corporate data asset.”

 

Related Posts

DQ-View: From Data to Decision

Podcast: Data Governance is Mission Possible

The Business versus IT—Tear down this wall!

MacGyver: Data Governance and Duct Tape

Live-Tweeting: Data Governance

Enterprise Data World 2010

Enterprise Data World 2009

TDWI World Conference Chicago 2009

Light Bulb Moments at DataFlux IDEAS 2010

DataFlux IDEAS 2009

DQ-Poll: Data Warehouse or Data Outhouse?

In many organizations, a data warehouse is the only system where data from numerous and disparate operational sources has been integrated into a single repository of enterprise data.

The rapid delivery of a single system of record containing fully integrated and historical data to be used as the source for most of the enterprise’s reporting and decision support needs has long been the rallying cry and promise of the data warehouse.

However, I have witnessed beautifully architected, elegantly implemented, and diligently maintained data warehouses simply get ignored by the organization, which continues to rely on its data silos and spreadsheets for reporting and decision making.

The most common reason is that these big boxes of data are often built with little focus on the quality of the data being delivered.

But that’s just my opinion based on my personal experience.  So let’s conduct an unscientific poll.

 

Additionally, please feel free to post a comment below and explain your vote or simply share your opinions and experiences.

DQ-View: From Data to Decision

Data Quality (DQ) View is an OCDQ regular segment.  Each DQ-View is a brief video discussion of a data quality key concept.

As I posited in The Circle of Quality, an organization’s success is measured by its business results, which are dependent on the quality of its business decisions, which rely on the quality of its data.  In this new DQ-View segment, I want to briefly discuss the relationship between data quality and decision quality and examine a few crucial aspects of the journey from data to decision.

 

DQ-View: From Data to Decision

 

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: DQ-View on Vimeo

 

Related Posts

The Business versus IT—Tear down this wall!

The Data-Decision Symphony

The Real Data Value is Business Insight

Scrum Screwed Up

Is your data complete and accurate, but useless to your business?

Finding Data Quality

Fantasy League Data Quality

TDWI World Conference Chicago 2009

 

Additional OCDQ Video Posts

DQ View: Achieving Data Quality Happiness

Video: Oh, the Data You’ll Show!

Data Quality is not a Magic Trick

DQ-View: The Cassandra Effect

DQ-View: Is Data Quality the Sun?

DQ-View: Designated Asker of Stupid Questions

Social Karma (Part 8)

Will people still read in the future?

Podcast: Data Governance is Mission Possible

The recent Information Management article Data – Who Cares! by Martin ABC Hansen of Platon has the provocative subtitle:

“If the need to care for data and manage it as an asset is so obvious, then why isn’t it happening?”

Hansen goes on to explain some of the possible reasons under an equally provocative section titled “Mission Impossible.”  It is a really good article that I recommend reading, and it also prompted me to record my thoughts on the subject in a new podcast:

You can also download this podcast (MP3 file) by clicking on this link: Data Governance is Mission Possible

Some of the key points covered in this approximately 15 minute OCDQ Podcast include:

  • Data is a strategic corporate asset because high quality data serves as a solid foundation for an organization’s success, empowering people, enabled by technology, to make better business decisions and optimize business performance
  • Data is an asset owned by the entire enterprise, and not owned by individual business units nor individual people
  • Data governance is the strategic alignment of people throughout the organization through the definition and enforcement of the declared policies that govern the complex ways in which people, business processes, data, and technology interact
  • Five steps for enforcing data governance policies:
    1. Documentation Use straightforward, natural language to document your policies in a way everyone can understand
    2. Communication Effective communication requires that you encourage open discussion and debate of all viewpoints
    3. Metrics Meaningful metrics can be effectively measured, and represent the business impact of data governance
    4. Remediation Correct any combination of business process, technology, data, and people—and sometimes all four
    5. Refinement Dynamically evolve and adapt your data governance policies—as well as their associated metrics
  • Data governance requires everyone within the organization to accept a shared responsibility for both failure and success
  • This blog post will self-destruct in 10 seconds . . . Just kidding, I didn’t have the budget for special effects

 

Related Posts

Shared Responsibility

Quality and Governance are Beyond the Data

Video: Declaration of Data Governance

Don’t Do Less Bad; Do Better Good

Delivering Data Happiness

The Circle of Quality

The Diffusion of Data Governance

Jack Bauer and Enforcing Data Governance Policies

The Prince of Data Governance

MacGyver: Data Governance and Duct Tape

 

Quality and Governance are Beyond the Data

Last week’s episode of DM Radio on Information Management, co-hosted as always by Eric Kavanagh and Jim Ericson, was a panel discussion about how and why data governance can improve the quality of an organization’s data, and the featured guests were Dan Soceanu of DataFlux, Jim Orr of Trillium Software, Steve Sarsfield of Talend, and Brian Parish of iData.

The relationship between data quality and data governance is a common question, and perhaps mostly because data governance is still an evolving discipline.  However, another contributing factor is the prevalence of the word “data” in the names given to most industry disciplines and enterprise information initiatives.

“Data governance goes well beyond just the data,” explained Orr.  “Administration, business process, and technology are also important aspects, and therefore the term data governance can be misleading.”

“So perhaps a best practice of data governance is not calling it data governance,” remarked Ericson.

From my perspective, data governance involves policies, people, business processes, data, and technology.  However, all of those last four concepts (people, business process, data, and technology) are critical to every enterprise initiative.

So I agree with Orr because I think that the key concept differentiating data governance is its definition and enforcement of the policies that govern the complex ways that people, business processes, data, and technology interact.

As it relates to data quality, I believe that data governance provides the framework for evolving data quality from a project to an enterprise-wide program by facilitating the collaboration of business and technical stakeholders.  Data governance aligns data usage with business processes through business relevant metrics, and enables people to be responsible for, among other things, data ownership and data quality.

“A basic form of data governance is tying the data quality metrics to their associated business processes and business impacts,” explained Sarsfield, the author of the great book The Data Governance Imperative, which explains that “the mantra of data governance is that technologists and business users must work together to define what good data is by constantly leveraging both business users, who know the value of the data, and technologists, who can apply what the business users know to the data.”

Data is used as the basis to make critical business decisions, and therefore “the key for data quality metrics is the confidence level that the organization has in the data,” explained Soceanu.  Data-driven decisions are better than intuition-driven decisions, but lacking confidence about the quality of their data can lead organizations to rely more on intuition for their business decisions.

The Data Asset: How Smart Companies Govern Their Data for Business Success, written by Tony Fisher, the CEO of DataFlux, is another great book about data governance, which explains that “data quality is about more than just improving your data.  Ultimately, the goal is improving your organization.  Better data leads to better decisions, which leads to better business.  Therefore, the very success of your organization is highly dependent on the quality of your data.”

Data is a strategic corporate asset and, by extension, data quality and data governance are both strategic corporate disciplines, because high quality data serves as a solid foundation for an organization’s success, empowering people, enabled by technology, to make better business decisions and optimize business performance.

Therefore, data quality and data governance both go well beyond just improving the quality of an organization’s data, because Quality and Governance are Beyond the Data.

 

Related Posts

Video: Declaration of Data Governance

Don’t Do Less Bad; Do Better Good

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Finding Data Quality

The Diffusion of Data Governance

MacGyver: Data Governance and Duct Tape

The Prince of Data Governance

Jack Bauer and Enforcing Data Governance Policies

Data Governance and Data Quality

The Data Quality of Dorian Gray

The Picture of Dorian Gray was a 19th century novel written by Oscar Wilde, which told the story of a young man who sold his soul to remain forever young and beautiful by having his recently painted portrait age rather than himself.  One of the allegories that can be drawn from the novel is our desire to cling, like Dorian Gray, to an idealized image of ourselves and of our lives.

I have previously blogged that when an organization’s data quality is discussed, it is very common to encounter data denial.

This is an understandable self-defense mechanism from the people responsible for business processes, technology, and data because of the simple fact that nobody likes to be blamed (or feel blamed) for causing or failing to fix data quality problems.

But data denial can also doom a data quality improvement initiative from the very beginning.

Of course, everyone will agree that ensuring high quality data is being used to make critical daily business decisions is vitally important to corporate success.  However, for an organization to improve its data quality, it has to admit that some of its business decisions are mistakes being made based on poor quality data.

But the organization has a desire to cling to an idealized image of its data and its data-driven business decisions, to treat its poor data quality the same way as Dorian Gray treated his portrait—by refusing to look at it.

However, The Data Quality of Dorian Gray is also a story that can only end in tragedy.

 

Related Posts

Once Upon a Time in the Data

The Data-Decision Symphony

The Idea of Order in Data

Hell is other people’s data

The Circle of Quality

Data Quality Industry: Problem Solvers or Enablers?

This morning I had the following Twitter conversation with Andy Bitterer of Gartner Research and ANALYSTerical, sparked by my previous post about Data Quality Magic, the one and only source of which I posited comes from the people involved:

 

What Say You?

Although Andy and I were just joking around, there is some truth beneath these tweets.  After all, according to Gartner research, “the market for data quality tools was worth approximately $727 million in software-related revenue as of the end of 2009, and is forecast to experience a compound annual growth rate (CAGR) of 12% during the next five years.” 

So I thought I would open this up to a good-natured debate. 

Do you think the data quality industry (software vendors, consultants, analysts, and conferences) is working harder to solve the problem of poor data quality or perpetuate the profitability of its continued existence?

All perspectives on this debate are welcome without bias.  Therefore, please post a comment below.

(Please Note: Comments advertising your products and services (or bashing your competitors) will NOT be approved.)

 

Related Posts

Which came first, the Data Quality Tool or the Business Need?

Do you believe in Magic (Quadrants)?

Can Enterprise-Class Solutions Ever Deliver ROI?

Promoting Poor Data Quality

The Once and Future Data Quality Expert

Imagining the Future of Data Quality

Data Quality Magic

In previous posts I explained that, at least in regards to data quality, there are no magic beans, tooth fairies, or magic tricks.

However, and before I am branded a Muggle, I want to assure you that magic does indeed exist in the world of data quality.

The common mistake is looking for data quality magic in the wrong places.  Historically, the quest begins with technology, and perhaps because of Clarke’s Third Law: “Any sufficiently advanced technology is indistinguishable from magic.”

Data quality tools are often believed to be magic, and especially by their salespeople.

But data quality tools are not magic.

The quest continues with methodology, and perhaps because of the Hedgehogian dream of a single, all-encompassing theory, which provides the certainty and control that comes from “just following the framework.”

Data quality methodologies are also often believed to be magic, and especially by our data perfectionists.

But data quality methodologies are not magic.

This is where the quest typically ends, after believing in magic technology and/or magic methodology both fail, but usually not from a lack of repeatedly trying—and repeatedly failing.

So if data quality magic doesn’t come from either technology or methodology, where does it come from?

In the 1988 movie Willow, the title character fails the test to become an apprentice of the village wizard.  The test was to choose which of the wizard’s fingers was the source of his magic—the correct answer was for Willow to choose his own finger.

Data quality magic comes from data quality magicians—from the People working on data quality initiatives, people who are united by trust and collaboration, guided by an adaptive methodology, and of course, enabled by advanced technology.

However, without question, the one and only source of Data Quality Magic comes from Data Quality People.

 

Related Posts

DQ-Tip: “Data quality tools do not solve data quality problems...”

There are no Magic Beans for Data Quality

The Tooth Fairy of Data Quality

Data Quality is not a Magic Trick

Do you believe in Magic (Quadrants)?

Video: Oh, the Data You’ll Show!

The Tell-Tale Data

Data Quality is People!

Data Quality is not an Act, it is a Habit

The Second Law of Data Quality states that it is not a one-time project, but a sustained program.  Or to paraphrase Aristotle:

“Data Quality is not an Act, it is a Habit.”

Habits are learned behaviors, which can become automatic after enough repetition.  Habits can also be either good or bad.

Sometimes we can become so focused on developing new good habits that we forget about our current good habits.  Other times we can become so focused on eliminating all of our bad habits that we lose ourselves in the quest for perfection.

This is why Aristotle was also an advocate of the Golden Mean, which is usually simplified into the sage advice:

“Moderation in all things.”

While helping our organization develop good habits for ensuring high quality data, we often use the term Best Practice.

Although data quality is a practice, it’s one we get better at as long as we continue practicing.  Quite often I have observed the bad habit of establishing, but never revisiting, best practices.

However, as our organization, and the business uses for our data, continues to evolve, so must our data quality practice.

Therefore, data quality is not an act, but it’s also not a best practice.  It’s a habit of continuous practice, continuous improvement, continuous learning, and continuous adaptation to continuous change—which is truly the best possible habit we can develop.

Data Quality is a Best Habit.

Create a Slippery Slope

Enterprise information initiatives, such as data governance, master data management, data quality, and business intelligence all face a common challenge—they require your organization to take on a significant and sustained change management effort.

Organizational change requires behavioral change.

Behavioral change requires more than just an executive management decree and a rational argument.  You need to unite the organization around a shared purpose, encourage collaboration, and elevate the change to a cause. 

Although some people within the organization will answer this call to action and become champions for the cause, many others will need more convincing.  As Guy Kawasaki advises, overcome this challenge by intentionally creating a slippery slope:

“Provide a safe first step.  Don’t put up any big hurdles in the beginning of the process.
The path to adopting a cause needs a slippery slope.”

Therefore, to get your enterprise information initiative off to a good start, make it easy for people to adopt the cause.

Create a slippery slope.

 

Related Posts

Common Change

“Some is not a number and soon is not a time”

Video: Declaration of Data Governance

Don’t Do Less Bad; Do Better Good

Delivering Data Happiness

The Business versus IT—Tear down this wall!

The Road of Collaboration

Darth Data

Darth Tater

While I was grocery shopping today, I couldn’t resist taking this picture of Darth Tater.

As the Amazon product review explains: “Be it a long time ago, in a galaxy far, far away or right here at home in the 21st century, Mr. Potato Head never fails to reinvent himself.”

I couldn’t help but think of how although data’s quality is determined by evaluating its fitness for the purpose of business use, most data has multiple business uses, and data of sufficient quality for one use may not be for other, perhaps unintended, business uses.

It is this “Reinventing data for mix and match business fun!” that often provides the context for what, in hindsight, appear to be obvious data quality issues.

It makes me wonder if it’s possible to turn high quality data to the dark side of the Force by misusing it for a business purpose for which it has no applicability, resulting in bad, albeit data-driven, business decisions.

Please post a comment and let me know if you think it is possible to turn Data-kin Quality-walker into Darth Data.

May the Data Quality be with you, always.

Commendable Comments (Part 7)

Blogging has made the digital version of my world much smaller and allowed my writing to reach a much larger audience than would otherwise be possible.  Although I am truly grateful to all of my readers, I am most grateful to my commenting readers. 

Since its inception over a year ago, this has been an ongoing series for expressing my gratitude to my readers for their truly commendable comments, which greatly improve the quality of my blog posts.

 

Commendable Comments

On Do you enjoy writing?, Corinna Martinez commented:

“To be literate, a person of letters, means one must occasionally write letters by hand.

The connection between brain and hand cannot be overlooked as a key component to learning.  It is by the very fact that it is labor intensive and requires thought that we are able to learn concepts and care thought into action.

One key feels the same as another and if the keyboard is changed then even the positioning of fingers while typing will have no significance.  My bread and butter is computers but all in the name of communications, understanding and resolution of problems plaguing people/organizations.

And yet, I will never be too far into a computer to neglect to write a note or letter to a loved one.  While I don’t journal, and some say that writing a blog is like journaling online, I love mixing and matching even searching for the perfect word or turn of phrase.

Although a certain number of simians may recreate something legible on machines, Shakespeare or literature of the level to inspire and move it will not be.

The pen is mightier than the sword—from as earthshaking as the downfall of nations to as simple as my having gotten jobs after handwriting simple thank you notes.

Unfortunately, it may go the way of the sword and be kept in glass cases instead of employed in its noblest and most dangerous task—wielded by masters of mind and purpose.”

On The Prince of Data Governance, Jarrett Goldfedder commented:

“Politics and self-interest are rarely addressed factors in principles of data governance, yet are such a strong component during some high-profile implementations, that data governance truly does need to be treated as an art rather than a science.

Data teams should have principles and policies to follow, but these can be easily overshadowed by decisions made from a few executives promoting their own agendas.  Somehow, built into the existing theories of data governance, we should consider how to handle these political influences using some measure of accountability that all team members—stakeholders included—need to have.”

On Jack Bauer and Enforcing Data Governance Policies, Jill Wanless commented:

“Data Governance enforcement is a combination of straightforward and logical activities that when implemented correctly will help you achieve compliance, and ensure the success of your program.  I would emphasize that they ALL (Documentation, Communication, Metrics, Remediation, Refinement) need to be part of your overall program, as doing one or a few without the others will lead to increased risk of failure.

My favorite?  Tough to choose.  The metrics are key, as are the documentation, remediation and refinement.  But to me they all depend upon good communications.  If you don’t communicate your policies, metrics, risks, issues, challenges, work underway, etc., you will fail!  I have seen instances where policies have been established, yet they weren’t followed for the simple fact that people were unaware they existed.”

On Is your data complete and accurate, but useless to your business?, Dylan Jones commented:

“This sparks an episode I had a few years ago with an engineering services company in the UK.

I ran a management workshop showing a lot of the issues we had uncovered.  As we were walking through a dashboard of all the findings one of the directors shouted out that the 20% completeness stats for a piece of engineering installation data was wrong, she had received no reports of missing data.

I drilled into the raw data and sure enough we found that 80% of the data was incomplete.

She was furious and demanded that site visits be carried out and engineers should be incentivized (i.e., punished!) in order to maintain this information.

What was interesting is that the data went back many years so I posed the question:

‘Has your decision-making ability been impeded by this lack of information?’

What followed was a lengthy debate, but the outcome was NO, it had little effect on operations or strategic decision making.

The company could have invested considerable amounts of time and money in maintaining this information but the benefits would have been marginal.

One of the most important dimensions to add to any data quality assessment is USEFULNESS, I use that as a weight to reduce the impact of other dimensions.  To extend your debate further, data may be hopelessly inaccurate and incomplete, but if it’s of no use, then let’s take it out of the equation.”

On Is your data complete and accurate, but useless to your business?, Gordon Hamilton commented:

“Data Quality dimensions that track a data set’s significance to the Business such as Relevance or Impact could help keep the care and feeding efforts for each data set in ratio to their importance to the Business.

I think you are suggesting that the Business’s strategic/tactical objectives should be used to self-assess and even prune data quality management efforts, in order to keep them aligned with the Business rather than letting them have an independent life of their own.

I wonder if all business activities could use a self-assessment metric built in to their processing so that they can realign to reality.  In the low levels of biology this is sometimes referred to as a ‘suicide gene’ that lets a cell decide when it is no longer needed.  Suicide is such a strong term though, maybe it could be called an: annual review to realign efforts to organizational goals gene.”

On Is your data complete and accurate, but useless to your business?, Winston Chen commented:

“A particularly nasty problem in data management is that data created for one purpose gets used for another.  Often, the people who use the data don't have a choice.  It’s the only data available!

And when the same piece of data is used for multiple purposes, it gets even tougher.  As you said, completeness and accuracy has a context: the same piece of data could be good for one purpose and useless for another.

A major goal of data governance is to define and enforce policies that aligns how data is created with how data is used.  And if conflicts arise—they surely will—there’s a mechanism for resolving them.”

On Data Quality and the Cupertino Effect, Marty Moseley commented:

“I usually separate those out by saying that validity is a binary measurement of whether or not a value is correct or incorrect within a certain context, whereas accuracy is a measurement of the valid value’s ‘correctness’ within the context of the other data surrounding it and/or the processes operating upon it.

So, validity answers the question: ‘Is ZW a valid country code?’ and the answer would (currently) be ‘Yes, on the African continent, or perhaps on planet Earth.’

Accuracy answers the question: ‘Is it 2.5 degrees Celsius today in Redding, California?’

To which the answer would measure several things: is 2.5 degrees Celsius a valid temperature for Redding, CA? (yes it is), is it probable this time of year? (no, it has never been nearly that cold on this date), and are there any weather anomalies noted that might recommend that 2.5C is valid for Redding today? (no, there are not). So even though 2.5C is a valid air temperature, Redding, CA is a valid city and state combination, and 2.5C is valid for Redding in some parts of the year, that temperature has never been seen in Redding on July 15th and therefore it is probably not accurate.

Another ‘accuracy’ use case is one I’ve run into before: Is it accurate that Customer A purchased $15,049.00 in <product> on order 123 on <this date>?

To answer this, you may look at the average order size for this product (in quantity and overall price), the average order sizes from Customer A (in quantity ordered and monetary value), any promotions that offer such pricing deals, etc.

Given that the normal credit card charges for this customer are in the $50.00 to $150.00 range, and that the products ordered are on average $10.00 to $30.00, and that even the best customers normally do not order more than $200, and that there has never been a single order from this type of customer for this amount, then it is highly unlikely that a purchase of this size is accurate.”

On Do you believe in Magic (Quadrants)?, Len Dubois commented:

“I believe Magic Quadrants (MQ) are a tool that clients of Gartner, and any one else that can get their hands on them, use as one data point in their decision making process.

Analytic reports, like any other data point, are as useful or dangerous as the user wants/needs it to be.  From a buyer’s perspective, a MQ can be used for lots of things:

1. To validate a market
2. To identify vendors in the marketplace
3. To identify minimum qualifications in terms of features and functionality
4. To identify trends
5. To determine a company’s viability
6. To justify one’s choice of a vendor
7. To justify value of a purchase
8. Worse case scenario: defends one choice of a failed selection
9. To demonstrate business value of a technology

I also believe they use the analysts, Ted and Andy in this instance, as a sounding board to validate what they believe or learned from other data points, i.e. references, white papers, demos, friends, colleagues, etc.

In the final analysis though, I know that clients usually make their selection based on many things, the MQ included.  One of the most important decision points is the relationship they have with a vendor or the one they believe they are going to be able to develop with a new vendor—and no MQ is going to tell you that.”

Thank You

Thank you all for your comments.  Your feedback is greatly appreciated—and truly is the best part of my blogging experience.

This entry in the series highlighted commendable comments on OCDQ Blog posts published in May, June, and July of 2010. 

Since there have been so many commendable comments, please don’t be offended if one of your comments wasn’t featured.

Please keep on commenting and stay tuned for future entries in the series.

 

Related Posts

Commendable Comments (Part 6)

Commendable Comments (Part 5)

Commendable Comments (Part 4)

Commendable Comments (Part 3)

Commendable Comments (Part 2)

Commendable Comments (Part 1)

DQ-Tip: “Data quality tools do not solve data quality problems...”

Data Quality (DQ) Tips is an OCDQ regular segment.  Each DQ-Tip is a clear and concise data quality pearl of wisdom.

“Data quality tools do not solve data quality problems—People solve data quality problems.”

This DQ-Tip came from the DataFlux IDEAS 2010 Assessing Data Quality Maturity workshop conducted by David Loshin, whose new book The Practitioner's Guide to Data Quality Improvement will be released next month.

Just like all technology, data quality tools are enablers.  Data quality tools provide people with the capability for solving data quality problems, for which there are no fast and easy solutions.  Although incredible advancements in technology continue, there are no Magic Beans for data quality.

And there never will be.

An organization’s data quality initiative can only be successful when people take on the challenge united by collaboration, guided by an effective methodology, and of course, enabled by powerful technology.

By far the most important variable in implementing successful and sustainable data quality improvements is acknowledging David’s sage advice:  people—not tools—solve data quality problems.

 

Related Posts

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

DQ-Tip: “There is no point in monitoring data quality...”

DQ-Tip: “Don't pass bad data on to the next person...”

DQ-Tip: “...Go talk with the people using the data”

DQ-Tip: “Data quality is about more than just improving your data...”

DQ-Tip: “Start where you are...”