Jim Harris

My name is Jim Harris, I am the Blogger-in-Chief of OCDQ Blog, and an independent consultant, speaker, and freelance writer for hire.

My Services Contact Me
Search OCDQ Blog
Recent Comments

Entries in Business Intelligence (43)

Tuesday
Nov162010

TDWI World Conference Orlando 2010

Last week I attended the TDWI World Conference held November 7-12 in Orlando, Florida at the Loews Royal Pacific Resort.

As always, TDWI conferences offer a variety of full-day and half-day courses taught in an objective, vendor-neutral manner, designed for professionals and taught by in-the-trenches practitioners who are well known in the industry.

In this blog post, I summarize a few key points from two of the courses I attended.  I used Twitter to help me collect my notes, and you can access the complete archive of my conference tweets on Twapper Keeper.

 

A Practical Guide to Analytics

Wayne Eckerson, author of the book Performance Dashboards: Measuring, Monitoring, and Managing Your Business, described the four waves of business intelligence:

  1. Reporting – What happened?
  2. Analysis – Why did it happen?
  3. Monitoring – What’s happening?
  4. Prediction – What will happen?

“Reporting is the jumping off point for analytics,” explained Eckerson, “but many executives don’t realize this.  The most powerful aspect of analytics is testing our assumptions.”  He went on to differentiate the two strains of analytics:

  1. Exploration and Analysis – Top-down and deductive, primarily uses query tools
  2. Prediction and Optimization – Bottom-up and inductive, primarily uses data mining tools

“A huge issue for predictive analytics is getting people to trust the predictions,” remarked Eckerson.  “Technology is the easy part, the hard part is selling the business benefits and overcoming cultural resistance within the organization.”

“The key is not getting the right answers, but asking the right questions,” he explained, quoting Ken Rudin of Zynga.

“Deriving insight from its unique information will always be a competitive advantage for every organization.”  He recommended the book Competing on Analytics: The New Science of Winning as a great resource for selling the business benefits of analytics.

 

Data Governance for BI Professionals

Jill Dyché, a partner and co-founder of Baseline Consulting, explained that data governance transcends business intelligence and other enterprise information initiatives such as data warehousing, master data management, and data quality.

“Data governance is the organizing framework,” explained Dyché, “for establishing strategy, objectives, and policies for corporate data.  Data governance is the business-driven policy making and oversight of corporate information.”

“Data governance is necessary,” remarked Dyché, “whenever multiple business units are sharing common, reusable data.”

“Data governance aligns data quality with business measures and acceptance, positions enterprise data issues as cross-functional, and ensures data is managed separately from its applications, thereby evolving data as a service (DaaS).”

In her excellent 2007 article Serving the Greater Good: Why Data Hoarding Impedes Corporate Growth, Dyché explained the need for “systemizing the notion that data – corporate asset that it is – belongs to everyone.”

“Data governance provides the decision rights around the corporate data asset.”

 

Related Posts

DQ-View: From Data to Decision

Podcast: Data Governance is Mission Possible

The Business versus IT—Tear down this wall!

MacGyver: Data Governance and Duct Tape

Live-Tweeting: Data Governance

Enterprise Data World 2010

Enterprise Data World 2009

TDWI World Conference Chicago 2009

Light Bulb Moments at DataFlux IDEAS 2010

DataFlux IDEAS 2009

Tuesday
Nov092010

DQ-View: From Data to Decision

Data Quality (DQ) View is an OCDQ regular segment.  Each DQ-View is a brief video discussion of a data quality key concept.

As I posited in The Circle of Quality, an organization’s success is measured by its business results, which are dependent on the quality of its business decisions, which rely on the quality of its data.  In this new DQ-View segment, I want to briefly discuss the relationship between data quality and decision quality and examine a few crucial aspects of the journey from data to decision.

 

DQ-View: From Data to Decision

 

If you are having trouble viewing this video, then you can watch it on Vimeo by clicking on this link: DQ-View on Vimeo

 

Related Posts

The Business versus IT—Tear down this wall!

The Data-Decision Symphony

The Real Data Value is Business Insight

Scrum Screwed Up

Is your data complete and accurate, but useless to your business?

Finding Data Quality

Fantasy League Data Quality

TDWI World Conference Chicago 2009

 

Additional OCDQ Video Posts

DQ View: Achieving Data Quality Happiness

Video: Oh, the Data You’ll Show!

Data Quality is not a Magic Trick

DQ-View: The Cassandra Effect

DQ-View: Is Data Quality the Sun?

DQ-View: Designated Asker of Stupid Questions

Social Karma (Part 8)

Will people still read in the future?

Thursday
Sep162010

The Business versus IT—Tear down this wall!

Business Information Technology

This diagram was published in the July 2009 blog post Business Information Technology by Steve Tuck of Datanomic, and was based on a conference conversation with Gwen Thomas of the Data Governance Institute, about the figurative wall, prevalent in most organizations, which literally separates the Business, who usually own its data and understand its use in making critical daily business decisions, from Information Technology (IT), who usually own and maintain the hardware and software infrastructure of its enterprise data architecture.

The success of all enterprise information initiatives requires that this wall be torn down, ending the conflict between the Business and IT, and forging a new collaborative union that Steve and Gwen called Business Information Technology.

 

Isn’t IT a part of the Business?

In his recent blog post Isn’t IT a Part of “the Business”?, Winston Chen of Kalido examined this common challenge, remarking how “IT is often a cost center playing a supporting role for the frontline functions.  But Finance is a cost center, too.  Is Finance really the Business?  How about Human Resources?  We don’t hear HR people talk about the Business versus HR, do we?”

“Key words are important in setting the tone for communication,” Winston explained.  “When our language suggests IT is not a part of the Business, it cements a damaging us-versus-them mentality.”

“It leads to isolation.  What we need today, more than ever, is close collaboration.”

 

Purple People

Earlier this year in his blog post “Purple People”: The Key to BI Success, Wayne Eckerson of TDWI used a colorful analogy to discuss this common challenge within the context of business intelligence (BI) programs.

Wayne explained that the color purple is formed by mixing two primary colors: red and blue.  These colors symbolize strong, distinct, and independent perspectives.  Wayne used red to represent IT and blue to represent the Business.

Purple People, according to Wayne, “are key intermediaries who can reconcile the Business and IT and forge a strong and lasting partnership that delivers real value to the organization.”

“Pure technologists or pure business people can’t harness BI successfully.  BI needs Purple People to forge tight partnerships between business people and technologists and harness information for business gain.”

I agree with Wayne, but I believe all enterprise information initiatives, and not just BI, need Purple People for success.

 

Tearing down the Business-IT Wall

My overly dramatic blog post title is obviously a reference to the famous speech by United States President Ronald Reagan at the Berlin Wall on June 12, 1987.  For more than 25 years, the Berlin Wall had stood as a symbol of not only a divided Germany and divided political ideologies, but more importantly, it was both a figurative and literal symbol of a deeper human divide.

Although Reagan’s speech was merely symbolic of the numerous and complex factors that eventually lead to the dismantling of the Berlin Wall and the end of the Cold War, symbolism is a powerful aspect of human culture—including corporate culture.

The Business-IT Wall is only a figurative wall, but it literally separates the Business and IT in most organizations today.

So much has been written about the need for Business-IT Collaboration on successful enterprise information initiatives that the message is often ignored because people are sick and tired of hearing about it.

However, although there are other barriers to success, and people, process, and technology are all important, by far the most important factor for true and lasting success to be possible is—peoplecollaborating.

Organizations must remove all symbolic obstacles, both figurative and literal, which contribute to the human divide preventing enterprise-wide collaboration within their unique corporate culture.

As for the Business-IT Wall, and all other similar barriers to our collaboration and success, the time is long overdue for us to:

Tear down this wall!

Related Posts

The Road of Collaboration

Finding Data Quality

Data Transcendentalism

Declaration of Data Governance

Podcast: Business Technology and Human-Speak

Not So Strange Case of Dr. Technology and Mr. Business

Data Quality is People!

You're So Vain, You Probably Think Data Quality Is About You

Tuesday
Aug312010

The Data-Decision Symphony

As I have explained in previous blog posts, I am almost as obsessive-compulsive about literature and philosophy as I am about data and data quality, because I believe that there is much that the arts and the sciences can learn from each other.

Therefore, I really enjoyed recently reading the book Proust Was a Neuroscientist by Jonah Lehrer, which shows that science is not the only path to knowledge.  In fact, when it comes to understanding the brain, art got there first.

Without doubt, I will eventually write several blog posts that use references from this book to help me explain some of my perspectives about data quality and its many related disciplines.

In this blog post, with help from Jonah Lehrer and the composer Igor Stravinsky, I will explain The Data-Decision Symphony.

 

Data, data everywhere

Data is now everywhere.  Data is no longer just in the structured rows of our relational databases and spreadsheets.  Data is also in the unstructured streams of our Facebook and Twitter status updates, as well as our blog posts, our photos, and our videos.

The challenge is can we somehow manage to listen for business insights among the endless cacophony of chaotic data volumes, and use those insights to enable better business decisions and deliver optimal business performance.

Whether you choose to measure it in terabytes, petabytes, or how much reality bites, the data deluge has commenced—and you had better bring your A-Game to D-Town.  In other words, you need to find innovative ways to derive business insight from your constantly increasing data volumes by overcoming the signal-to-noise ratio encountered during your data analysis.

 

The Music of the Data

This complex challenge of filtering out the noise of the data until you can detect the music of the data, which is just another way of saying the data that you need to make a critical business decision, is very similar to how we actually experience music.

As Jonah Lehrer explains, “music is nothing but a sliver of sound that we have learned how to hear.  Our sense of sound is a work in progress.  Neurons in the auditory cortex are constantly being altered by the songs and symphonies we listen to.”

“Instead of representing the full spectrum of sound waves vibrating inside the ear, the auditory cortex focuses on finding the note amid the noise.  We tune out the cacophony we can’t understand.”

“This is why we can recognize a single musical pitch played by different instruments.  Although a trumpet and violin produce very different sound waves, we are designed to ignore these differences.  All we care about is pitch.”

Instead of attempting to analyze all of the available data before making a business decision, we need to focus on finding the right data signals amid the data noise.  We need to tune out the cacophony of all the data we don’t need.

Of course, this is easier in theory than it is in practice.

But this is why we need to always begin our data analysis with the business decision in mind.  Many organizations begin with only the data in mind, which results in performing analysis that provides little, if any, business insight and decision support.

“But a work of music,” Lehrer continues, “is not simply a set of individual notes arranged in time.”

“Music really begins when the separate pitches are melted into a pattern.  This is a consequence of the brain’s own limitations.  Music is the pleasurable overflow of information.  Whenever a noise exceeds our processing abilities . . . [we stop] . . . trying to understand the individual notes and seek instead to understand the relationship between the notes.”

“It is this psychological instinct—this desperate neuronal search for a pattern, any pattern—that is the source of music.”

Although few would describe analyzing large volumes of data as a “pleasurable overflow of information,” it is our search for a pattern, any pattern in the data relevant to the decision, which allows us to discover a potential source of business insight.

 

The Data-Decision Symphony

“When we listen to a symphony,” explains Lehrer, “we hear a noise in motion, each note blurring into the next.”

“The sound seems continuous.  Of course, the physical reality is that each sound wave is really a separate thing, as discrete as the notes written in the score.  But this isn’t the way we experience the music.”

“We continually abstract on our own inputs, inventing patterns in order to keep pace with the onrush of noise.  And once the brain finds a pattern, it immediately starts to make predictions, imagining what notes will come next.  It projects imaginary order into the future, transposing the melody we have just heard into the melody we expect.  By listening for patterns, by interpreting every note in terms of expectations, we turn the scraps of sound into the ebb and flow of a symphony.”

This is also how we arrive at making a critical business decision based on data analysis. 

We discover a pattern of business context, relevant to the decision, and start making predictions, imagining what will come next, projecting imaginary order into the data stream, turning bits and bytes into the ebb and flow of The Data-Decision Symphony.

However, our search for the consonance of business context among the dissonance of data, could cause us to draw comforting, but false, conclusions—especially if unaware of any confirmation bias—resulting in bad, albeit data-driven, business decisions.

The musicologist Leonard Meyer, in his 1956 book Emotion and Meaning in Music, explained how “music is defined by its flirtation with—but not submission to—expectations of order.  Although music begins with our predilection for patterns, the feeling of music begins when the pattern we imagine starts to break down.”

Lehrer explains how Igor Stravinsky, in The Rite of Spring, “forces us to generate patterns from the music itself, and not from our preconceived notions of what the music should be like.”

Therefore, we must be vigilant when we perform data analysis, making sure to generate patterns from the data itself, and not from our preconceived notions of what the data should be like—especially when we encounter less than perfect data quality.

As Jonah Lehrer explains, “the brain is designed to learn by association: if this, then that.  Music works by subtly toying with our expected associations, enticing us to make predictions and then confronting us with our prediction errors.”

“Music is the sound of art changing the brain.”

The Data-Decision Symphony is the sound of the art and science of data analysis enabling better business decisions.

 

Related Posts

Data, data everywhere, but where is data quality?

The Real Data Value is Business Insight

The Road of Collaboration

The Idea of Order in Data

Hell is other people’s data

The Circle of Quality

 

Data Quality Music (DQ-Songs)

A Record Named Duplicate

New Time Human Business

People

You Can’t Always Get the Data You Want

A spoonful of sugar helps the number of data defects go down

Data Quality is such a Rush

I’m Bringing DQ Sexy Back

Imagining the Future of Data Quality

The Very Model of a Modern DQ General

Monday
Aug232010

The Real Data Value is Business Insight

Data Values for COUNTRY Understanding your data usage is essential to improving its quality, and therefore, you must perform data analysis on a regular basis.

A data profiling tool can help you by automating some of the grunt work needed to begin your data analysis, such as generating levels of statistical summaries supported by drill-down details, including data value frequency distributions (like the ones shown to the left).

However, a common mistake is to hyper-focus on the data values.

Narrowing your focus to the values of individual fields is a mistake when it causes you to lose sight of the wider context of the data, which can cause other errors like mistaking validity for accuracy.

Understanding data usage is about analyzing its most important context—how your data is being used to make business decisions.

 

“Begin with the decision in mind”

In his excellent recent blog post It’s time to industrialize analytics, James Taylor wrote that “organizations need to be much more focused on directing analysts towards business problems.”  Although Taylor was writing about how, in advanced analytics (e.g., data mining, predictive analytics), “there is a tendency to let analysts explore the data, see what can be discovered,” I think this tendency is applicable to all data analysis, including less advanced analytics like data profiling and data quality assessments.

Please don’t misunderstand—Taylor and I are not saying that there is no value in data exploration, because, without question, it can definitely lead to meaningful discoveries.  And I continue to advocate that the goal of data profiling is not to find answers, but instead, to discover the right questions.

However, as Taylor explained, it is because “the only results that matter are business results” that data analysis should always “begin with the decision in mind.  Find the decisions that are going to make a difference to business results—to the metrics that drive the organization.  Then ask the analysts to look into those decisions and see what they might be able to predict that would help make better decisions.”

Once again, although Taylor is discussing predictive analytics, this cogent advice should guide all of your data analysis.

 

The Real Data Value is Business Insight

The Real Data Value is Business Insight

Returning to data quality assessments, which create and monitor metrics based on summary statistics provided by data profiling tools (like the ones shown in the mockup to the left), elevating what are low-level technical metrics up to the level of business relevance will often establish their correlation with business performance, but will not establish metrics that drive—or should drive—the organization.

Although built from the bottom-up by using, for the most part, the data value frequency distributions, these metrics lose sight of the top-down fact that business insight is where the real data value lies.

However, data quality metrics such as completeness, validity, accuracy, and uniqueness, which are just a few common examples, should definitely be created and monitored—unfortunately, a single straightforward metric called Business Insight doesn’t exist.

But let’s pretend that my other mockup metrics were real—50% of the data is inaccurate and there is an 11% duplicate rate.

Oh, no!  The organization must be teetering on the edge of oblivion, right?  Well, 50% accuracy does sound really bad, basically like your data’s accuracy is no better than flipping a coin.  However, which data is inaccurate, and far more important, is the inaccurate data actually being used to make a business decision?

As for the duplicate rate, I am often surprised by the visceral reaction it can trigger, such as: “how can we possibly claim to truly understand who our most valuable customers are if we have an 11% duplicate rate?”

So, would reducing your duplicate rate to only 1% automatically result in better customer insight?  Or would it simply mean that the data matching criteria was too conservative (e.g., requiring an exact match on all “critical” data fields), preventing you from discovering how many duplicate customers you have?  (Or maybe the 11% indicates the matching criteria was too aggressive).

My point is that accuracy and duplicate rates are just numbers—what determines if they are a good number or a bad number?

The fundamental question that every data quality metric you create must answer is: How does this provide business insight?

If a data quality (or any other data) metric can not answer this question, then it is meaningless.  Meaningful metrics always represent business insight because they were created by beginning with the business decisions in mind.  Otherwise, your metrics could provide the comforting, but false, impression that all is well, or you could raise red flags that are really red herrings.

Instead of beginning data analysis with the business decisions in mind, many organizations begin with only the data in mind, which results in creating and monitoring data quality metrics that provide little, if any, business insight and decision support.

Although analyzing your data values is important, you must always remember that the real data value is business insight.

 

Related Posts

The First Law of Data Quality

Adventures in Data Profiling

Data Quality and the Cupertino Effect

Is your data complete and accurate, but useless to your business?

The Idea of Order in Data

You Can’t Always Get the Data You Want

Red Flag or Red Herring? 

DQ-Tip: “There is no point in monitoring data quality…”

Which came first, the Data Quality Tool or the Business Need?

Selling the Business Benefits of Data Quality

Saturday
Aug142010

Scrum Screwed Up

This was the inaugural cartoon on Implementing Scrum by Michael Vizdos and Tony Clark, which does a great job of illustrating the fable of The Chicken and the Pig used to describe the two types of roles involved in Scrum, which, quite rare for our industry, is not an acronym, but one common approach among many iterative, incremental frameworks for agile software development.

Scrum is also sometimes used as a generic synonym for any agile framework.  Although I’m not an expert, I’ve worked on more than a few agile programs.  And since I am fond of metaphors, I will use the Chicken and the Pig to describe two common ways that scrums of all kinds can easily get screwed up:

  1. All Chicken and No Pig
  2. All Pig and No Chicken

However, let’s first establish a more specific context for agile development using one provided by a recent blog post on the topic.

 

A Contrarian’s View of Agile BI

In her excellent blog post A Contrarian’s View of Agile BI, Jill Dyché took a somewhat unpopular view of a popular view, which is something that Jill excels at—not simply for the sake of doing it—because she’s always been well-known for telling it like it is.

In preparation for the upcoming TDWI World Conference in San Diego, Jill was pondering the utilization of agile methodologies in business intelligence (aka BI—ah, there’s one of those oh so common industry acronyms straight out of The Acronymicon).

The provocative TDWI conference theme is: “Creating an Agile BI Environment—Delivering Data at the Speed of Thought.”

Now, please don’t misunderstand.  Jill is an advocate for doing agile BI the right way.  And it’s certainly understandable why so many organizations love the idea of agile BI.  Especially when you consider the slower time to value of most other approaches when compared with, following Jill’s rule of thumb, how agile BI would have “either new BI functionality or new data deployed (at least) every 60-90 days.  This approach establishes BI as a program, greater than the sum of its parts.”

“But in my experience,” Jill explained, “if the organization embracing agile BI never had established BI development processes in the first place, agile BI can be a road to nowhere.  In fact, the dirty little secret of agile BI is this: It’s companies that don’t have the discipline to enforce BI development rigor in the first place that hurl themselves toward agile BI.”

“Peek under the covers of an agile BI shop,” Jill continued, “and you’ll often find dozens or even hundreds of repeatable canned BI reports, but nary an advanced analytics capability. You’ll probably discover an IT organization that failed to cultivate solid relationships with business users and is now hiding behind an agile vocabulary to justify its own organizational ADD. It’s lack of accountability, failure to manage a deliberate pipeline, and shifting work priorities packaged up as so much scrum.”

I really love the term Organizational Attention Deficit Disorder, and in spite of myself, I can’t help but render it acronymically as OADD—which should be pronounced as “odd” because the “a” is silent, as in: “Our organization is really quite OADD, isn’t it?”

 

Scrum Screwed Up: All Chicken and No Pig

Returning to the metaphor of the Scrum roles, the pigs are the people with their bacon in the game performing the actual work, and the chickens are the people to whom the results are being delivered.  Most commonly, the pigs are IT or the technical team, and the chickens are the users or the business team.  But these scrum lines are drawn in the sand, and therefore easily crossed.

Many organizations love the idea of agile BI because they are thinking like chickens and not like pigs.  And the agile life is always easier for the chicken because they are only involved, whereas the pig is committed.

OADD organizations often “hurl themselves toward agile BI” because they’re enamored with the theory, but unrealistic about what the practice truly requires.  They’re all-in when it comes to the planning, but bacon-less when it comes to the execution.

This is one common way that OADD organizations can get Scrum Screwed Up—they are All Chicken and No Pig.

 

Scrum Screwed Up: All Pig and No Chicken

Closer to the point being made in Jill’s blog post, IT can pretend to be pigs making seemingly impressive progress, but although they’re bringing home the bacon, it lacks any real sizzle because it’s not delivering any real advanced analytics to business users. 

Although they appear to be scrumming, IT is really just screwing around with technology, albeit in an agile manner.  However, what good is “delivering data at the speed of thought” when that data is neither what the business is thinking, nor truly needs?

This is another common way that OADD organizations can get Scrum Screwed Up—they are All Pig and No Chicken.

 

Scrum is NOT a Silver Bullet

Scrum—and any other agile framework—is not a silver bullet.  However, agile methodologies can work—and not just for BI.

But whether you want to call it Chicken-Pig Collaboration, or Business-IT Collaboration, or Shiny Happy People Holding Hands, a true enterprise-wide collaboration facilitated by a cross-disciplinary team is necessary for any success—agile or otherwise.

Agile frameworks, when implemented properly, help organizations realistically embrace complexity and avoid oversimplification, by leveraging recurring iterations of relatively short duration that always deliver data-driven solutions to business problems. 

Agile frameworks are successful when people take on the challenge united by collaboration, guided by effective methodology, and supported by enabling technology.  Agile frameworks allow the enterprise to follow what works, for as long as it works, and without being afraid to adjust as necessary when circumstances inevitably change.

For more information about Agile BI, follow Jill Dyché and TDWI World Conference in San Diego, August 15-20 via Twitter.

Tuesday
Jul272010

Is your data complete and accurate, but useless to your business?

Ensuring that complete and accurate data is being used to make critical daily business decisions is perhaps the primary reason why data quality is so vitally important to the success of your organization. 

However, this effort can sometimes take on a life of its own, where achieving complete and accurate data is allowed to become the raison d'être of your data management strategy—in other words, you start managing data for the sake of managing data.

When this phantom menace clouds your judgment, your data might be complete and accurate—but useless to your business.

 

Completeness and Accuracy

How much data is necessary to make an effective business decision?  Having complete (i.e., all available) data seems obviously preferable to incomplete data.  However, with data volumes always burgeoning, the unavoidable fact is that sometimes having more data only adds confusion instead of clarity, thereby becoming a distraction instead of helping you make a better decision.

Returning to my original question, how much data is really necessary to make an effective business decision? 

Accuracy, which, thanks to substantial assistance from my readers, was defined in a previous post as both the correctness of a data value within a limited context such as verification by an authoritative reference (i.e., validity) combined with the correctness of a valid data value within an extensive context including other data as well as business processes (i.e., accuracy). 

Although accurate data is obviously preferable to inaccurate data, less than perfect data quality can not be used as an excuse to delay making a critical business decision.  When it comes to the quality of the data being used to make these business decisions, you can’t always get the data you want, but if you try sometimes, you just might find, you get the business insight you need.

 

Data-driven Solutions for Business Problems

Obviously, there are even more dimensions of data quality beyond completeness and accuracy. 

However, although it’s about more than just improving your data, data quality can be misperceived to be an activity performed just for the sake of the data.  When, in fact, data quality is an enterprise-wide initiative performed for the sake of implementing data-driven solutions for business problems, enabling better business decisions, and delivering optimal business performance.

In order to accomplish these objectives, data has to be not only complete and accurate, as well as whatever other dimensions you wish to add to your complete and accurate definition of data quality, but most important, data has to be useful to the business.

Perhaps the most common definition for data quality is “fitness for the purpose of use.” 

The missing word, which makes this definition both incomplete and inaccurate, puns intended, is “business.”  In other words, data quality is “fitness for the purpose of business use.”  How complete and how accurate (and however else) the data needs to be is determined by its business use—or uses since, in the vast majority of cases, data has multiple business uses.

 

Data, data everywhere

With silos replicating data as well as new data being created daily, managing all of the data is not only becoming impractical, but because we are too busy with the activity of trying to manage all of it, no one is stopping to evaluate usage or business relevance.

The fifth of the Five New Ideas From 2010 MIT Information Quality Industry Symposium, which is a recent blog post written by Mark Goloboy, was that “60-90% of operational data is valueless.”

“I won’t say worthless,” Goloboy clarified, “since there is some operational necessity to the transactional systems that created it, but valueless from an analytic perspective.  Data only has value, and is only worth passing through to the Data Warehouse if it can be directly used for analysis and reporting.  No news on that front, but it’s been more of the focus since the proliferation of data has started an increasing trend in storage spend.”

In his recent blog post Are You Afraid to Say Goodbye to Your Data?, Dylan Jones discussed the critical importance of designing an archive strategy for data, as opposed to the default position many organizations take, where burgeoning data volumes are allowed to proliferate because, in large part, no one wants to delete (or, at the very least, archive) any of the existing data. 

This often results in the data that the organization truly needs for continued success getting stuck in the long line of data waiting to be managed, and in many cases, behind data for which the organization no longer has any business use (and perhaps never even had the chance to use when the data was actually needed to make critical business decisions).

“When identifying data in scope for a migration,” Dylan advised, “I typically start from the premise that ALL data is out of scope unless someone can justify its existence.  This forces the emphasis back on the business to justify their use of the data.”

 

Data Memorioso

Funes el memorioso is a short story by Jorge Luis Borges, which describes a young man named Ireneo Funes who, as a result of a horseback riding accident, has lost his ability to forget.  Although Funes has a tremendous memory, he is so lost in the details of everything he knows that he is unable to convert the information into knowledge and unable, as a result, to grow in wisdom.

In Spanish, the word memorioso means “having a vast memory.”  When Data Memorioso is your data management strategy, your organization becomes so lost in all of the data it manages that it is unable to convert data into business insight and unable, as a result, to survive and thrive in today’s highly competitive and rapidly evolving marketplace.

In their great book Made to Stick: Why Some Ideas Survive and Others Die, Chip Heath and Dan Heath explained that “an accurate but useless idea is still useless.  If a message can’t be used to make predictions or decisions, it is without value, no matter how accurate or comprehensive it is.”  I believe that this is also true for your data and your organization’s business uses for it.

Is your data complete and accurate, but useless to your business?

 

Related Posts

Data Quality and the Cupertino Effect

Data Rock Stars: The Rolling Forecasts

Data!

Data, data everywhere, but where is data quality?

DQ-Tip: “There is no point in monitoring data quality…”

DQ-Tip: “Data quality is about more than just improving your data...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

The First Law of Data Quality

Tuesday
Jul132010

The 2010 Data Quality Blogging All-Stars

The 2010 Major League Baseball (MLB) All-Star Game is being held tonight (July 13) at Angel Stadium in Anaheim, California.

For those readers who are not baseball fans, the All-Star Game is an annual exhibition held in mid-July that showcases the players with (for the most part) the best statistical performances during the first half of the MLB season.

Last summer, I began my own annual exhibition of showcasing the bloggers whose posts I have personally most enjoyed reading during the first half of the data quality blogging season. 

Therefore, this post provides links to stellar data quality blog posts that were published between January 1 and June 30 of 2010.  My definition of a “data quality blog post” also includes Data Governance, Master Data Management, and Business Intelligence. 

Please Note: There is no implied ranking in the order that bloggers or blogs are listed, other than that Individual Blog All-Stars are listed first, followed by Vendor Blog All-Stars, and the blog posts are listed in reverse chronological order by publication date.

 

Henrik Liliendahl Sørensen

From Liliendahl on Data Quality:

 

Dylan Jones

From Data Quality Pro:

 

Julian Schwarzenbach

From Data and Process Advantage Blog:

 

Rich Murnane

From Rich Murnane's Blog:

 

Phil Wright

From Data Factotum:

 

Initiate – an IBM Company

From Mastering Data Management:

 

Baseline Consulting

From their three blogs: Inside the Biz with Jill Dyché, Inside IT with Evan Levy, and In the Field with our Experts:

 

DataFlux – a SAS Company

From Community of Experts:

 

Related Posts

Recently Read: May 15, 2010

Recently Read: March 22, 2010

Recently Read: March 6, 2010

Recently Read: January 23, 2010

The 2009 Data Quality Blogging All-Stars

 

Additional Resources

From the IAIDQ, read the 2010 issues of the Blog Carnival for Information/Data Quality:

Thursday
Jul082010

Finding Data Quality

Have you ever experienced that sinking feeling, where you sense if you don’t find data quality, then data quality will find you?

In the spring of 2003, Pixar Animation Studios produced one of my all-time favorite Walt Disney Pictures—Finding Nemo

This blog post is an hommage to not only the film, but also to the critically important role into which data quality is cast within all of your enterprise information initiatives, including business intelligence, master data management, and data governance. 

I hope that you enjoy reading this blog post, but most important, I hope you always remember: “Data are friends, not food.”

 

Data Silos

“Mine!  Mine!  Mine!  Mine!  Mine!”

That’s the Data Silo Mantra—and it is also the bane of successful enterprise information management.  Many organizations persist on their reliance on vertical data silos, where each and every business unit acts as the custodian of their own private data—thereby maintaining their own version of the truth.

Impressive business growth can cause an organization to become a victim of its own success.  Significant collateral damage can be caused by this success, and most notably to the organization’s burgeoning information architecture.

Earlier in an organization’s history, it usually has fewer systems and easily manageable volumes of data, thereby making managing data quality and effectively delivering the critical information required to make informed business decisions everyday, a relatively easy task where technology can serve business needs well—especially when the business and its needs are small.

However, as the organization grows, it trades effectiveness for efficiency, prioritizing short-term tactics over long-term strategy, and by seeing power in the hoarding of data, not in the sharing of information, the organization chooses business unit autonomy over enterprise-wide collaboration—and without this collaboration, successful enterprise information management is impossible.

A data silo often merely represents a microcosm of an enterprise-wide problem—and this truth is neither convenient nor kind.

 

Data Profiling

“I see a light—I’m feeling good about my data . . . Good feeling’s gone—AHH!”

Although it’s not exactly a riddle wrapped in a mystery inside an enigma,  understanding your data is essential to using it effectively and improving its quality—to achieve these goals, there is simply no substitute for data analysis.

Data profiling can provide a reality check for the perceptions and assumptions you may have about the quality of your data.  A data profiling tool can help you by automating some of the grunt work needed to begin your analysis.

However, it is important to remember that the analysis itself can not be automated—you need to translate your analysis into the meaningful reports and questions that will facilitate more effective communication and help establish tangible business context.

Ultimately, I believe the goal of data profiling is not to find answers, but instead, to discover the right questions. 

Discovering the right questions requires talking with data’s best friends—its stewards, analysts, and subject matter experts.  These discussions are a critical prerequisite for determining data usage, standards, and the business relevant metrics for measuring and improving data quality.  Always remember that well performed data profiling is highly interactive and a very iterative process.

 

Defect Prevention

“You, Data-Dude, takin’ on the defects.
You’ve got serious data quality issues, dude.
Awesome.”

Even though it is impossible to truly prevent every problem before it happens, proactive defect prevention is a highly recommended data quality best practice because the more control enforced where data originates, the better the overall quality will be for enterprise information.

Although defect prevention is most commonly associated with business and technical process improvements, after identifying the burning root cause of your data defects, you may predictably need to apply some of the principles of behavioral data quality.

In other words, understanding the complex human dynamics often underlying data defects is necessary for developing far more effective tactics and strategies for implementing successful and sustainable data quality improvements.

 

Data Cleansing

“Just keep cleansing.  Just keep cleansing.
Just keep cleansing, cleansing, cleansing.
What do we do?  We cleanse, cleanse.”

That’s not the Data Cleansing Theme Song—but it can sometimes feel like it.  Especially whenever poor data quality negatively impacts decision-critical information, the organization may legitimately prioritize a reactive short-term response, where the only remediation will be fixing the immediate problems.

Balancing the demands of this data triage mentality with the best practice of implementing defect prevention wherever possible, will often create a very challenging situation for you to contend with on an almost daily basis.

Therefore, although comprehensive data remediation will require combining reactive and proactive approaches to data quality, you need to be willing and able to put data cleansing tools to good use whenever necessary.

 

Communication

“It’s like he’s trying to speak to me, I know it.
Look, you’re really cute, but I can’t understand what
you’re saying.
Say that data quality thing again.”

I hear this kind of thing all the time (well, not the “you’re really cute” part).

Effective communication improves everyone’s understanding of data quality, establishes a tangible business context, and helps prioritize critical data issues. 

Keep in mind that communication is mostly about listening.  Also, be prepared to face “data denial” when data quality problems are discussed.  Most often, this is a natural self-defense mechanism for the people responsible for business processes, technology, and data—and because of the simple fact that nobody likes to feel blamed for causing or failing to fix the data quality problems.

The key to effective communication is clarity.  You should always make sure that all data quality concepts are clearly defined and in a language that everyone can understand.  I am not just talking about translating the techno-mumbojumbo, because even business-speak can sound more like business-babbling—and not just to the technical folks.

Additionally, don’t be afraid to ask questions or admit when you don’t know the answers.  Many costly mistakes can be made when people assume that others know (or pretend to know themselves) what key concepts and other terminology actually mean.

Never underestimate the potential negative impacts that the point of view paradox can have on communication.  For example, the perspectives of the business and technical stakeholders can often appear to be diametrically opposed.

Practicing effective communication requires shutting our mouth, opening our ears, and empathically listening to each other, instead of continuing to practice ineffective communication, where we merely take turns throwing word-darts at each other.

 

Collaboration

“Oh and one more thing:
When facing the daunting challenge of collaboration,
Work through it together, don't avoid it.
Come on, trust each other on this one.
Yes—trust—it’s what successful teams do.”

Most organizations suffer from a lack of collaboration, and as noted earlier, without true enterprise-wide collaboration, true success is impossible.

Beyond the data silo problem, the most common challenge for collaboration is the divide perceived to exist between the Business and IT, where the Business usually owns the data and understands its meaning and use in the day-to-day operation of the enterprise, and IT usually owns the hardware and software infrastructure of the enterprise’s technical architecture.

However, neither the Business nor IT alone has all of the necessary knowledge and resources required to truly be successful.  Data quality requires that the Business and IT forge an ongoing and iterative collaboration.

You must rally the team that will work together to improve the quality of your data.  A cross-disciplinary team will truly be necessary because data quality is neither a business issue nor a technical issue—it is both, truly making it an enterprise issue.

Executive sponsors, business and technical stakeholders, business analysts, data stewards, technology experts, and yes, even consultants and contractors—only when all of you are truly working together as a collaborative team, can the enterprise truly achieve great things, both tactically and strategically.

Successful enterprise information management is spelled E—A—C.

Of course, that stands for Enterprises—Always—Collaborate.  The EAC can be one seriously challenging place, dude.

You don’t know if you know what they know, or if they know what you know, but when you know, then they know, you know?

It’s like first you are all like “Whoa!” and they are all like “Whoaaa!” then you are like “Sweet!” and then they are like “Totally!”

This critical need for collaboration might seem rather obvious.  However, as all of the great philosophers have taught us, sometimes the hardest thing to learn is the least complicated.

Okay.  Squirt will now give you a rundown of the proper collaboration technique:

“Good afternoon. We’re gonna have a great collaboration today.
Okay, first crank a hard cutback as you hit the wall.
There’s a screaming bottom curve, so watch out.
Remember: rip it, roll it, and punch it.”

 

Finding Data Quality

As more and more organizations realize the critical importance of viewing data as a strategic corporate asset, data quality is becoming an increasingly prevalent topic of discussion.

However, and somewhat understandably, data quality is sometimes viewed as a small fish—albeit with a “lucky fin”—in a much larger pond.

In other words, data quality is often discussed only in its relation to enterprise information initiatives such as data integration, master data management, data warehousing, business intelligence, and data governance.

There is nothing wrong with this perspective, and as a data quality expert, I admit to my general tendency to see data quality in everything.  However, regardless of the perspective from which you begin your journey, I believe that eventually you will be Finding Data Quality wherever you look as well.

 

Follow OCDQ

If you enjoyed this blog post, then please subscribe to OCDQ via my RSS feed, my E-mail updates, or Google Reader.

You can also follow OCDQ on Twitter, fan the Facebook page for OCDQ, and connect with me on LinkedIn.


Monday
Mar012010

The Circle of Quality

Explaining why data quality is so vitally important to an organization's success that it needs to be viewed as a corporate asset is unfortunately not an easy task to accomplish. 

A common mistake made during such attempts is failing to frame data quality issues in a business context, which leads the organization's business stakeholders to understandably mistake data quality for a purely technical issue apparently lacking any tangible impact on their daily business decisions.

An organization's success is measured by the quality of the results it produces.  The results are dependent on the quality of its business decisions.  Those decisions rely on the quality of its information.  That information is based on the quality of its data. 

Therefore, data must be viewed as a corporate asset because high quality data serves as a solid foundation for business success.

As the above diagram illustrates, quality is a fundamental requirement and success criterion all throughout the interconnected Data–>Information–>Decision–>Result business context continuum, which I refer to as The Circle of Quality.

 

The Circle of Quality

Peter Benson of the ECCMA explains that data is intrinsically simple and can be divided into one of two categories:

  1. Master Data – data that identifies and describes things
  2. Transaction Data – data that describes events

In other words, master data is an abstract description of the real-world entities with which the organization conducts business (e.g., customers and vendors).  Transaction data is an abstract description of the real-world interactions that the organization has with those entities (e.g., sales and purchases).

Although a common definition for data quality is fitness for the purpose of use, the common challenge is that all data has multiple uses—and each specific use has its own specific fitness requirements. 

Viewing each specific use as the information that is derived from data, I define information as data in use or data in action.

Although data's quality can be objectively measured separate from its many uses (i.e., data can be fit to serve as at least the basis for each and every purpose), information's quality can only be subjectively measured according to its specific use.

Therefore, information is being customized to meet the subjective needs of a particular business unit and/or a particular tactical or strategic initiative.  In other words, the information is being used as the basis for making a critical business decision.

The quality of the decision is measured by the business result that it produces.  Of course, the reality is that the result is often not immediate and also contingent upon a complex interplay of multiple business decisions.

The result can also produce more data, which could come in the form of new transaction data associated with either existing master data (e.g., sales to existing customers) or new master data (e.g., purchases from new vendors). 

Either way, with the arrival of this new data, yet another spin around The Circle of Quality begins all over again . . .

 

Conclusion

The Circle of Quality illustrates the interconnected business context continuum formed by data, information, decisions, and results.  Additionally, it demonstrates the need for a sustained enterprise-wide program of data governance and data quality, which is necessary for managing data as a corporate asset.

The Circle of Quality also helps illustrate the true challenge of root cause analysis, where poor quality could be occurring in one or more places within the business context continuum. 

And of course, even total quality management is no guarantee of success since it is certainly possible to have high quality data, derive high quality information from it, and then make high quality business decisions based upon it—but still get poor results.

However, it's also easy to imagine the highly questionable results produced when data quality is not considered vital to an organization's success.  Therefore, not managing data as a corporate asset is nothing less than extremely risky business.

 

Related Posts

Beyond a “Single Version of the Truth”

Poor Data Quality is a Virus

DQ-Tip: “Don't pass bad data on to the next person...”

The Only Thing Necessary for Poor Data Quality

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

Data Governance and Data Quality

The Data-Information Continuum

Tuesday
Sep082009

Fantasy League Data Quality

For over 25 years, I have been playing fantasy league baseball and football.  For those readers who are not familiar with fantasy sports, they simulate ownership of a professional sports team.  Participants “draft” individual real-world professional athletes to “play” for their fantasy team, which competes with other teams using a scoring system based on real-world game statistics.

What does any of this have to do with data quality?

 

Master Data Management

In Worthy Data Quality Whitepapers (Part 1), Peter Benson of the ECCMA explained that “data is intrinsically simple and can be divided into data that identifies and describes things, master data, and data that describes events, transaction data.”

In fantasy sports, this distinction is very easy to make:

  • Master Data – data describing the real-world players on the roster of each fantasy team.

  • Transaction Data – data describing the statistical events of the real-world games played.

In his magnificent book Master Data Management, David Loshin explained that “master data objects are those core business objects used in the different applications across the organization, along with their associated metadata, attributes, definitions, roles, connections and taxonomies.”

In fantasy sports, Players and Teams are the master data objects with many characteristics including the following:

  • Attributes – Player attributes include first name, last name, birth date, professional experience in years, and their uniform number.  Team attributes include name, owner, home city, and the name and seating capacity of their stadium.

  • Definitions – Player and Team have both Professional and Fantasy definitions.  Professional teams and players are real-world objects managed independent of fantasy sports.  Fundamentally, Professional Team and Professional Player are reference data objects from external content providers (Major League Baseball and the National Football League).  Therefore, Fantasy Team and Fantasy Player are the true master data objects.  The distinction between professional and fantasy teams is simpler than between professional and fantasy players.  Not every professional player will be used in fantasy sports (e.g. offensive linemen in football) and the same professional player can simultaneously play for multiple fantasy teams in different fantasy leagues (or sometimes even within the same league – e.g. fantasy tournament formats).

  • Roles – In baseball, the player roles are Batter, Pitcher, and Fielder.  In football, the player roles are Offense, Defense and Special Teams.  In both sports, the same player can have multiple or changing roles (e.g. in National League baseball, a pitcher is also a batter as well as a fielder).

  • Connections – Fantasy Players are connected to Fantasy Teams via a roster.  On the fantasy team roster, fantasy players are connected to real-world statistical events via a lineup, which indicates the players active for a given scoring period (typically a week in fantasy football and either a week or a day in fantasy baseball).  These connections change throughout the season.  Lineups change as players can go from active to inactive (i.e. on the bench) and rosters change as players can be traded, released, and signed (i.e. free agents added to the roster after the draft).

  • Taxonomies – Positions played are defined individually and organized into taxonomies.  In baseball, first base and third base are individual positions, but both are infield positions and more specifically corner infield.  Second base and short stop are also infield positions, and more specifically middle infield.  And not all baseball positions are associated with fielding (e.g. a pinch runner can accrue statistics such as stolen bases and runs scored without either fielding or batting).

 

Data Warehousing

Combining a personal hobby with professional development, I built a fantasy baseball data warehouse.  I downloaded master, reference, and transaction data from my fantasy league's website.  I prepared these sources in a flat file staging area, from which I applied inserts and updates to the relational database tables in my data warehouse, where I used dimensional modeling.

My dimension tables were Date, Professional Team, Player, Position, Fantasy League, and Fantasy Team.  All of these tables (except for Date) were Type 2 slowly changing dimensions to support full history and rollbacks.

For simplicity, the Date dimension was calendar days with supporting attributes for all aggregate levels (e.g. monthly aggregate fact tables used the last day of the month as opposed to a separate Month dimension).

Professional and fantasy team rosters, as well as fantasy team lineups and fantasy league team membership, were all tracked using factless fact tables.  For example, the Professional Team Roster factless fact table used the Date, Professional Team, and Player dimensions, and the Fantasy Team Lineup factless fact table used the Date, Fantasy League, Fantasy Team, Player, and Position dimensions. 

The factless fact tables also allowed Player to be used as a conformed dimension for both professional and fantasy players since a Fantasy Player dimension would redundantly store multiple instances of the same professional player for each fantasy team he played for, as well as using Fantasy League and Fantasy Team as snowflaked dimensions.

My base fact tables were daily transactions for Batting Statistics and Pitching Statistics.  These base fact tables used only the Date, Professional Team, Player, and Position dimensions to provide the lowest level of granularity for daily real-world statistical performances independent of fantasy baseball. 

The Fantasy League and Fantasy Team dimensions replaced the Professional Team dimension in a separate family of base fact tables for daily fantasy transactions for Batting Statistics and Pitching Statistics.  This was necessary to accommodate for the same professional player simultaneously playing for multiple fantasy teams in different fantasy leagues.  Alternatively, I could have stored each fantasy league in a separate data mart.

Aggregate fact tables accumulated month-to-date and year-to-date batting and pitching statistical totals for fantasy players and teams.  Additional aggregate fact tables incremented current rolling snapshots of batting and pitching statistical totals for the previous 7, 14 and 21 days for players only.  Since the aggregate fact tables were created to optimize fantasy league query performance, only the base tables with daily fantasy transactions were aggregated.

Conformed facts were used in both the base and aggregate fact tables.  In baseball, this is relatively easy to achieve since most statistics have been consistently defined and used for decades (and some for more than a century). 

For example, batting average is defined as the ratio of hits to at bats and has been used consistently since the late 19th century.  However, there are still statistics with multiple meanings.  For example, walks and strikeouts are recorded for both batters and pitchers, with very different connotations for each.

Additionally, in the late 20th century, new baseball statistics such as secondary average and runs created have been defined with widely varying formulas.  Metadata tables with definitions (including formulas where applicable) were included in the baseball data warehouse to avoid confusion.

For remarkable reference material containing clear-cut guidelines and real-world case studies for both dimensional modeling and data warehousing, I highly recommend all three books in the collection: Ralph Kimball's Data Warehouse Toolkit Classics.

 

Business Intelligence

In his Information Management special report BI: Only as Good as its Data Quality, William Giovinazzo explained that “the chief promise of business intelligence is the delivery to decision-makers the information necessary to make informed choices.”

As a reminder for the uninitiated, fantasy sports simulate the ownership of a professional sports team.  Business intelligence techniques are used for pre-draft preparation and for tracking your fantasy team's statistical performance during the season in order to make management decisions regarding your roster and lineup.

The aggregate fact tables that I created in my baseball data warehouse delivered the same information available as standard reports from my fantasy league's website.  This allowed me to use the website as an external data source to validate my results, which is commonly referred to as using a “surrogate source of the truth.”  However, since I also used the website as the original source of my master, reference, and transaction data, I double-checked my results using other websites. 

This is a significant advantage for fantasy sports – there are numerous external data sources that can be used for validation freely available online.  Of course, this wasn't always the case. 

Over 25 years ago when I first started playing fantasy sports, my friends and I had to manually tabulate statistics from newspapers.  We migrated to customized computer spreadsheet programs (this was in the days before everyone had PCs with Microsoft Excel – which we eventually used) before the Internet revolution and cloud computing brought the wonderful world of fantasy sports websites that we enjoy today.

Now with just a few mouse clicks, I can run regression analysis to determine whether my next draft pick should be a first baseman predicted to hit 30 home runs or a second baseman predicted to have a .300 batting average and score 100 runs. 

I can check my roster for weaknesses in statistics difficult to predict, such as stolen bases and saves.  I can track the performances of players I didn't draft to decide if I want to make a trade, as well as accurately evaluate a potential trade from another owner who claims to be offering players who are having a great year and could help my team be competitive.

 

Data Quality

In her fantastic book Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information, Danette McGilvray comprehensively defines all of the data quality dimensions, which include the following most applicable to fantasy sports:

  • Accuracy – A measure of the correctness of the content of the data, which requires an authoritative source of reference to be identified and accessible.

  • Timeliness and Availability – A measure of the degree to which data are current and available for use as specified and in the time frame in which they are expected.

  • Data Coverage – A measure of the availability and comprehensiveness of data compared to the total data universe or population of interest.

  • Presentation Quality – A measure of how information is presented to and collected from those who utilize it.  Format and appearance support appropriate use of the information.

  • Perception, Relevance, and Trust – A measure of the perception of and confidence in the data quality; the importance, value, and relevance of the data to business needs.

 

Conclusion

I highly doubt that you will see Fantasy League Data Quality coming soon to a fantasy sports website near you.  It is just as unlikely that my future blog posts will conclude with “The Mountain Dew Post Game Show” or that I will rename my blog to “OCDQ – The Worldwide Leader in Data Quality” (duh-nuh-nuh, duh-nuh-nuh).

However, fantasy sports are more than just a hobby.  They're a thriving real-world business providing many excellent examples of best practices in action for master data management, data warehousing, and business intelligence – all implemented upon a solid data quality foundation.

So who knows, maybe some Monday night this winter we'll hear Hank Williams Jr. sing:

“Are you ready for some data quality?”


Sunday
Jul262009

The Wisdom of Failure

Earlier this month, I had the honor of being interviewed by Ajay Ohri on his blog Decision Stats, which is an excellent source of insights on business intelligence and data mining as well as interviews with industry thought leaders and chief evangelists.

One of the questions Ajay asked me during my interview was what methods and habits would I recommend to young analysts just starting in the business intelligence field and part of my response was:

“Don't be afraid to ask questions or admit when you don't know the answers.  The only difference between a young analyst just starting out and an expert is that the expert has already made and learned from all the mistakes caused by being afraid to ask questions or admitting when you don't know the answers.”

It is perhaps one of life’s cruelest paradoxes that some lessons simply cannot be taught, but instead have to be learned through the pain of making mistakes.  To err is human, but not all humans learn from their errors.  In fact, some of us find it extremely difficult to even simply acknowledge when we have made a mistake.  This was certainly true for me earlier in my career.

 

The Wisdom of Crowds

One of my favorite books is The Wisdom of Crowds by James Surowiecki.  Before reading it, I admit that I believed crowds were incapable of wisdom and that the best decisions are based on the expert advice of carefully selected individuals.  However, Surowiecki wonderfully elucidates the folly of “chasing the expert” and explains the four conditions that characterize wise crowds: diversity of opinion, independent thinking, decentralization and aggregation.  The book is also balanced by examining the conditions (e.g. confirmation bias and groupthink) that can commonly undermine the wisdom of crowds.  All and all, it is a wonderful discourse on both collective intelligence and collective ignorance with practical advice on how to achieve the former and avoid the latter.

 

Chasing the Data Quality Expert

Without question, a data quality expert can be an invaluable member of your team.  Often an external consultant, a data quality expert can provide extensive experience and best practices from successful implementations.  However, regardless of their experience, even with other companies in your industry, every organization and its data is unique.  An expert's perspective definitely has merit, but their opinions and advice should not be allowed to dominate the decision making process. 

“The more power you give a single individual in the face of complexity,” explains Surowiecki, “the more likely it is that bad decisions will get made.”  No one person regardless of their experience and expertise can succeed on their own.  According to Surowiecki, the best experts “recognize the limits of their own knowledge and of individual decision making.”

 

“Success is on the far side of failure”

One of the most common obstacles organizations face with data quality initiatives is that many initial attempts end in failure.  Some fail because of lofty expectations, unmanaged scope creep, and the unrealistic perspective that data quality problems can be permanently “fixed” by a one-time project as opposed to needing a sustained program.  However, regardless of the reason for the failure, it can negatively affect morale and cause employees to resist participating in the next data quality effort.

Although a common best practice is to perform a post-mortem in order to document the lessons learned, sometimes the stigma of failure persuades an organization to either skip the post-mortem or ignore its findings. 

However, in the famous words of IBM founder Thomas J. Watson: “Success is on the far side of failure.” 

A failed data quality initiative may have been closer to success than you realize.  At the very least, there are important lessons to be learned from the mistakes that were made.  The sooner you can recognize your mistakes, the sooner you can mitigate their effects and hopefully prevent them from happening again.

 

The Wisdom of Failure

In one of my other favorite books, How We Decide, Jonah Lehrer explains:

“The brain always learns the same way, accumulating wisdom through error...there are no shortcuts to this painstaking process...becoming an expert just takes time and practice...once you have developed expertise in a particular area...you have made the requisite mistakes.”

Therefore, although it may be true that experience is the path that separates knowledge from wisdom, I have come to realize that the true wisdom of my experience is the wisdom of failure.

 

Related Posts

A Portrait of the Data Quality Expert as a Young Idiot

All I Really Need To Know About Data Quality I Learned In Kindergarten

The Nine Circles of Data Quality Hell

Saturday
May092009

TDWI World Conference Chicago 2009

Founded in 1995, TDWI (The Data Warehousing Institute™) is the premier educational institute for business intelligence and data warehousing that provides education, training, certification, news, and research for executives and information technology professionals worldwide.  TDWI conferences always offer a variety of full-day and half-day courses taught in an objective, vendor-neutral manner.  The courses taught are designed for professionals and taught by in-the-trenches practitioners who are well known in the industry.

 

TDWI World Conference Chicago 2009 was held May 3-8 in Chicago, Illinois at the Hyatt Regency Hotel and was a tremendous success.  I attended as a Data Quality Journalist for the International Association for Information and Data Quality (IAIDQ).

I used Twitter to provide live reporting from the conference.  Here are my notes from the courses I attended: 

 

BI from Both Sides: Aligning Business and IT

Jill Dyché, CBIP, is a partner and co-founder of Baseline Consulting, a management and technology consulting firm that provides data integration and business analytics services.  Jill is responsible for delivering industry and client advisory services, is a frequent lecturer and writer on the business value of IT, and writes the excellent Inside the Biz blog.  She is the author of acclaimed books on the business value of information: e-Data: Turning Data Into Information With Data Warehousing and The CRM Handbook: A Business Guide to Customer Relationship Management.  Her latest book, written with Evan Levy, is Customer Data Integration: Reaching a Single Version of the Truth.

Course Quotes from Jill Dyché:

  • Five Critical Success Factors for Business Intelligence (BI):
    1. Organization - Build organizational structures and skills to foster a sustainable program
    2. Processes - Align both business and IT development processes that facilitate delivery of ongoing business value
    3. Technology - Select and build technologies that deploy information cost-effectively
    4. Strategy - Align information solutions to the company's strategic goals and objectives
    5. Information - Treat data as an asset by separating data management from technology implementation
  • Three Different Requirement Categories:
    1. What is the business need, pain, or problem?  What business questions do we need to answer?
    2. What data is necessary to answer those business questions?
    3. How do we need to use the resulting information to answer those business questions?
  • “Data warehouses are used to make business decisions based on data – so data quality is critical”
  • “Even companies with mature enterprise data warehouses still have data silos - each business area has its own data mart”
  • “Instead of pushing a business intelligence tool, just try to get people to start using data”
  • “Deliver a usable system that is valuable to the business and not just a big box full of data”

 

TDWI Data Governance Summit

Philip Russom is the Senior Manager of Research and Services at TDWI, where he oversees many of TDWI’s research-oriented publications, services, and events.  Prior to joining TDWI in 2005, he was an industry analyst covering BI at Forrester Research, as well as a contributing editor with Intelligent Enterprise and Information Management (formerly DM Review) magazines.

Summit Quotes from Philip Russom:

  • “Data Governance usually boils down to some form of control for data and its usage”
  • “Four Ps of Data Governance: People, Policies, Procedures, Process”
  • “Three Pillars of Data Governance: Compliance, Business Transformation, Business Integration”
  • “Two Foundations of Data Governance: Business Initiatives and Data Management Practices”
  • “Cross-functional collaboration is a requirement for successful Data Governance”

 

Becky Briggs, CBIP, CMQ/OE, is a Senior Manager and Data Steward for Airlines Reporting Corporation (ARC) and has 25 years of experience in data processing and IT - the last 9 in data warehousing and BI.  She leads the program team responsible for product, project, and quality management, business line performance management, and data governance/stewardship.

Summit Quotes from Becky Briggs:

  • “Data Governance is the act of managing the organization's data assets in a way that promotes business value, integrity, usability, security and consistency across the company”
  • Five Steps of Data Governance:
    1. Determine what data is required
    2. Evaluate potential data sources (internal and external)
    3. Perform data profiling and analysis on data sources
    4. Data Services - Definition, modeling, mapping, quality, integration, monitoring
    5. Data Stewardship - Classification, access requirements, archiving guidelines
  • “You must realize and accept that Data Governance is a program and not just a project”

 

Barbara Shelby is a Senior Software Engineer for IBM with over 25 years of experience holding positions of technical specialist, consultant, and line management.  Her global management and leadership positions encompassed network authentication, authorization application development, corporate business systems data architecture, and database development.

Summit Quotes from Barbara Shelby:

  • Four Common Barriers to Data Governance:
    1. Information - Existence of information silos and inconsistent data meanings
    2. Organization - Lack of end-to-end data ownership and organization cultural challenges
    3. Skill - Difficulty shifting resources from operational to transformational initiatives
    4. Technology - Business data locked in large applications and slow deployment of new technology
  • Four Key Decision Making Bodies for Data Governance:
    1. Enterprise Integration Team - Oversees the execution of CIO funded cross enterprise initiatives
    2. Integrated Enterprise Assessment - Responsible for the success of transformational initiatives
    3. Integrated Portfolio Management Team - Responsible for making ongoing business investment decisions
    4. Unit Architecture Review - Responsible for the IT architecture compliance of business unit solutions

 

Lee Doss is a Senior IT Architect for IBM with over 25 years of information technology experience.  He has a patent for process of aligning strategic capability for business transformation and he has held various positions including strategy, design, development, and customer support for IBM networking software products.

Summit Quotes from Lee Doss:

  • Five Data Governance Best Practices:
    1. Create a sense of urgency that the organization can rally around
    2. Start small, grow fast...pick a few visible areas to set an example
    3. Sunset legacy systems (application, data, tools) as new ones are deployed
    4. Recognize the importance of organization culture…this will make or break you
    5. Always, always, always – Listen to your customers

 

Kevin Kramer is a Senior Vice President and Director of Enterprise Sales for UMB Bank and is responsible for development of sales strategy, sales tool development, and implementation of enterprise-wide sales initiatives.

Summit Quotes from Kevin Kramer:

  • “Without Data Governance, multiple sources of customer information can produce multiple versions of the truth”
  • “Data Governance helps break down organizational silos and shares customer data as an enterprise asset”
  • “Data Governance provides a roadmap that translates into best practices throughout the entire enterprise”

 

Kanon Cozad is a Senior Vice President and Director of Application Development for UMB Bank and is responsible for overall technical architecture strategy and oversees information integration activities.

Summit Quotes from Kanon Cozad:

  • “Data Governance identifies business process priorities and then translates them into enabling technology”
  • “Data Governance provides direction and Data Stewardship puts direction into action”
  • “Data Stewardship identifies and prioritizes applications and data for consolidation and improvement”

 

Jill Dyché, CBIP, is a partner and co-founder of Baseline Consulting, a management and technology consulting firm that provides data integration and business analytics services.  (For Jill's complete bio, please see above).

Summit Quotes from Jill Dyché:

  • “The hard part of Data Governance is the data
  • “No data will be formally sanctioned unless it meets a business need”
  • “Data Governance focuses on policies and strategic alignment”
  • “Data Management focuses on translating defined polices into executable actions”
  • “Entrench Data Governance in the development environment”
  • “Everything is customer data – even product and financial data”

 

Data Quality Assessment - Practical Skills

Arkady Maydanchik is a co-founder of Data Quality Group, a recognized practitioner, author, and educator in the field of data quality and information integration.  Arkady's data quality methodology and breakthrough ARKISTRA technology were used to provide services to numerous organizations.  Arkady is the author of the excellent book Data Quality Assessment, a frequent speaker at various conferences and seminars, and a contributor to many journals and online publications.  Data quality curriculum by Arkady Maydanchik can be found at eLearningCurve.

Course Quotes from Arkady Maydanchik:

  • “Nothing is worse for data quality than desperately trying to fix it during the last few weeks of an ETL project”
  • “Quality of data after conversion is in direct correlation with the amount of knowledge about actual data”
  • “Data profiling tools do not do data profiling - it is done by data analysts using data profiling tools”
  • “Data Profiling does not answer any questions - it helps us ask meaningful questions”
  • “Data quality is measured by its fitness to the purpose of use – it's essential to understand how data is used”
  • “When data has multiple uses, there must be data quality rules for each specific use”
  • “Effective root cause analysis requires not stopping after the answer to your first question - Keep asking: Why?”
  • “The central product of a Data Quality Assessment is the Data Quality Scorecard”
  • “Data quality scores must be both meaningful to a specific data use and be actionable”
  • “Data quality scores must estimate both the cost of bad data and the ROI of data quality initiatives”

 

Modern Data Quality Techniques in Action - A Demonstration Using Human Resources Data

Gian Di Loreto formed Loreto Services and Technologies in 2004 from the client services division of Arkidata Corporation.  Loreto Services provides data cleansing and integration consulting services to Fortune 500 companies.  Gian is a classically trained scientist - he received his PhD in elementary particle physics from Michigan State University.

Course Quotes from Gian Di Loreto:

  • “Data Quality is rich with theory and concepts – however it is not an academic exercise, it has real business impact”
  • “To do data quality well, you must walk away from the computer and go talk with the people using the data”
  • “Undertaking a data quality initiative demands developing a deeper knowledge of the data and the business”
  • “Some essential data quality rules are ‘hidden’ and can only be discovered by ‘clicking around’ in the data”
  • “Data quality projects are not about systems working together - they are about people working together”
  • “Sometimes, data quality can be ‘good enough’ for source systems but not when integrated with other systems”
  • “Unfortunately, no one seems to care about bad data until they have it”
  • “Data quality projects are only successful when you understand the problem before trying to solve it”

 

Mark Your Calendar

TDWI World Conference San Diego 2009 - August 2-7, 2009.

TDWI World Conference Orlando 2009 - November 1-6, 2009.

TDWI World Conference Las Vegas 2010 - February 21-26, 2010.

Page 1 2 3