July 01, 2016

Data Governance Frameworks are like Jigsaw Puzzles

July 01, 2016/ Jim Harris

In a recent interview, Jill Dyché explained a common misconception, namely that a data governance framework is not a strategy. “Unlike other strategic initiatives that involve IT,” Jill explained, “data governance needs to be designed. The cultural factors, the workflow factors, the organizational structure, the ownership, the political factors, all need to be accounted for when you are designing a data governance roadmap.”

“People need a mental model, that is why everybody loves frameworks,” Jill continued. “But they are not enough and I think the mistake that people make is that once they see a framework, rather than understanding its relevance to their organization, they will just adapt it and plaster it up on the whiteboard and show executives without any kind of context. So they are already defeating the purpose of data governance, which is to make it work within the context of your business problems, not just have some kind of mental model that everybody can agree on, but is not really the basis for execution.”

“So it’s a really, really dangerous trend,” Jill cautioned, “that we see where people equate strategy with framework because strategy is really a series of collected actions that result in some execution — and that is exactly what data governance is.”

And in her excellent article Data Governance Next Practices: The 5 + 2 Model, Jill explained that data governance requires a deliberate design so that the entire organization can buy into a realistic execution plan, not just a sound bite. As usual, I agree with Jill, since, in my experience, many people expect a data governance framework to provide eureka-like moments of insight.

In The Myths of Innovation, Scott Berkun debunked the myth of the eureka moment using the metaphor of a jigsaw puzzle.

“When you put the last piece into place, is there anything special about that last piece or what you were wearing when you put it in?” Berkun asked. “The only reason that last piece is significant is because of the other pieces you’d already put into place. If you jumbled up the pieces a second time, any one of them could turn out to be the last, magical piece.”

“The magic feeling at the moment of insight, when the last piece falls into place,” Berkun explained, “is the reward for many hours (or years) of investment coming together. In comparison to the simple action of fitting the puzzle piece into place, we feel the larger collective payoff of hundreds of pieces’ worth of work.”

Perhaps the myth of the data governance framework could also be debunked using the metaphor of a jigsaw puzzle.

Data governance requires the coordination of a complex combination of a myriad of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data quality remediation, data stewardship, business process optimization, technology enablement, change management — and many other puzzle pieces.

How could a data governance framework possibly predict how you will assemble the puzzle pieces? Or how the puzzle pieces will fit together within your unique corporate culture? Or which of the many aspects of data governance will turn out to be the last (or even the first) piece of the puzzle to fall into place in your organization? And, of course, there is truly no last piece of the puzzle, since data governance is an ongoing program because the business world constantly gets jumbled up by change.

So, data governance frameworks are useful, but only if you realize that data governance frameworks are like jigsaw puzzles.

August 04, 2011

Big Data and Big Analytics

August 04, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Jill Dyché is the Vice President of Thought Leadership and Education at DataFlux. Jill’s role at DataFlux is a combination of best-practice expert, key client advisor and all-around thought leader. She is responsible for industry education, key client strategies and market analysis in the areas of data governance, business intelligence, master data management and customer relationship management. Jill is a regularly featured speaker and the author of several books.

Jill’s latest book, Customer Data Integration: Reaching a Single Version of the Truth (Wiley & Sons, 2006), was co-authored with Evan Levy and shows the business breakthroughs achieved with integrated customer data.

Dan Soceanu is the Director of Product Marketing and Sales Enablement at DataFlux. Dan manages global field sales enablement and product marketing, including product messaging and marketing analysis. Prior to joining DataFlux in 2008, Dan has held marketing, partnership and market research positions with Teradata, General Electric and FormScape, as well as data management positions in the Financial Services sector.

Dan received his Bachelor of Science in Business Administration from Kutztown University of Pennsylvania, as well as earning his Master of Business Administration from Bloomsburg University of Pennsylvania.

On this episode of OCDQ Radio, Jill Dyché, Dan Soceanu, and I discuss the recent Pacific Northwest BI Summit, where the three core conference topics were Cloud, Collaboration, and Big Data, the last of which lead to a discussion about Big Analytics.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

July 23, 2011

Seventeen Syllables about the Seven Letter Tsunami

July 23, 2011/ Jim Harris

“Business gets smarter
As the Data gets bigger
As the World gets small”

~ A Haiku about Big Data
Inspired by Jill Dyché

A Brave New Data World

Thaler’s Apples and Data Quality Oranges

Data Confabulation in Business Intelligence

Data In, Decision Out

The Data-Decision Symphony

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Data, data everywhere, but where is data quality?

Finding Data Quality

Data!

You Can’t Always Get the Data You Want

To Our Data Perfectionists

November 16, 2010

TDWI World Conference Orlando 2010

November 16, 2010/ Jim Harris

Last week I attended the TDWI World Conference held November 7-12 in Orlando, Florida at the Loews Royal Pacific Resort.

As always, TDWI conferences offer a variety of full-day and half-day courses taught in an objective, vendor-neutral manner, designed for professionals and taught by in-the-trenches practitioners who are well known in the industry.

In this blog post, I summarize a few key points from two of the courses I attended. I used Twitter to help me collect my notes, and you can access the complete archive of my conference tweets on Twapper Keeper.

A Practical Guide to Analytics

Wayne Eckerson, author of the book Performance Dashboards: Measuring, Monitoring, and Managing Your Business, described the four waves of business intelligence:

Reporting – What happened?
Analysis – Why did it happen?
Monitoring – What’s happening?
Prediction – What will happen?

“Reporting is the jumping off point for analytics,” explained Eckerson, “but many executives don’t realize this. The most powerful aspect of analytics is testing our assumptions.” He went on to differentiate the two strains of analytics:

Exploration and Analysis – Top-down and deductive, primarily uses query tools
Prediction and Optimization – Bottom-up and inductive, primarily uses data mining tools

“A huge issue for predictive analytics is getting people to trust the predictions,” remarked Eckerson. “Technology is the easy part, the hard part is selling the business benefits and overcoming cultural resistance within the organization.”

“The key is not getting the right answers, but asking the right questions,” he explained, quoting Ken Rudin of Zynga.

“Deriving insight from its unique information will always be a competitive advantage for every organization.” He recommended the book Competing on Analytics: The New Science of Winning as a great resource for selling the business benefits of analytics.

Data Governance for BI Professionals

Jill Dyché, a partner and co-founder of Baseline Consulting, explained that data governance transcends business intelligence and other enterprise information initiatives such as data warehousing, master data management, and data quality.

“Data governance is the organizing framework,” explained Dyché, “for establishing strategy, objectives, and policies for corporate data. Data governance is the business-driven policy making and oversight of corporate information.”

“Data governance is necessary,” remarked Dyché, “whenever multiple business units are sharing common, reusable data.”

“Data governance aligns data quality with business measures and acceptance, positions enterprise data issues as cross-functional, and ensures data is managed separately from its applications, thereby evolving data as a service (DaaS).”

In her excellent 2007 article Serving the Greater Good: Why Data Hoarding Impedes Corporate Growth, Dyché explained the need for “systemizing the notion that data – corporate asset that it is – belongs to everyone.”

“Data governance provides the decision rights around the corporate data asset.”

DQ-View: From Data to Decision

Podcast: Data Governance is Mission Possible

The Business versus IT—Tear down this wall!

MacGyver: Data Governance and Duct Tape

Live-Tweeting: Data Governance

Enterprise Data World 2010

Enterprise Data World 2009

TDWI World Conference Chicago 2009

Light Bulb Moments at DataFlux IDEAS 2010

DataFlux IDEAS 2009

August 14, 2010

Scrum Screwed Up

August 14, 2010/ Jim Harris

This was the inaugural cartoon on Implementing Scrum by Michael Vizdos and Tony Clark, which does a great job of illustrating the fable of The Chicken and the Pig used to describe the two types of roles involved in Scrum, which, quite rare for our industry, is not an acronym, but one common approach among many iterative, incremental frameworks for agile software development.

Scrum is also sometimes used as a generic synonym for any agile framework. Although I’m not an expert, I’ve worked on more than a few agile programs. And since I am fond of metaphors, I will use the Chicken and the Pig to describe two common ways that scrums of all kinds can easily get screwed up:

All Chicken and No Pig
All Pig and No Chicken

However, let’s first establish a more specific context for agile development using one provided by a recent blog post on the topic.

A Contrarian’s View of Agile BI

In her excellent blog post A Contrarian’s View of Agile BI, Jill Dyché took a somewhat unpopular view of a popular view, which is something that Jill excels at—not simply for the sake of doing it—because she’s always been well-known for telling it like it is.

In preparation for the upcoming TDWI World Conference in San Diego, Jill was pondering the utilization of agile methodologies in business intelligence (aka BI—ah, there’s one of those oh so common industry acronyms straight out of The Acronymicon).

The provocative TDWI conference theme is: “Creating an Agile BI Environment—Delivering Data at the Speed of Thought.”

Now, please don’t misunderstand. Jill is an advocate for doing agile BI the right way. And it’s certainly understandable why so many organizations love the idea of agile BI. Especially when you consider the slower time to value of most other approaches when compared with, following Jill’s rule of thumb, how agile BI would have “either new BI functionality or new data deployed (at least) every 60-90 days. This approach establishes BI as a program, greater than the sum of its parts.”

“But in my experience,” Jill explained, “if the organization embracing agile BI never had established BI development processes in the first place, agile BI can be a road to nowhere. In fact, the dirty little secret of agile BI is this: It’s companies that don’t have the discipline to enforce BI development rigor in the first place that hurl themselves toward agile BI.”

“Peek under the covers of an agile BI shop,” Jill continued, “and you’ll often find dozens or even hundreds of repeatable canned BI reports, but nary an advanced analytics capability. You’ll probably discover an IT organization that failed to cultivate solid relationships with business users and is now hiding behind an agile vocabulary to justify its own organizational ADD. It’s lack of accountability, failure to manage a deliberate pipeline, and shifting work priorities packaged up as so much scrum.”

I really love the term Organizational Attention Deficit Disorder, and in spite of myself, I can’t help but render it acronymically as OADD—which should be pronounced as “odd” because the “a” is silent, as in: “Our organization is really quite OADD, isn’t it?”

Scrum Screwed Up: All Chicken and No Pig

Returning to the metaphor of the Scrum roles, the pigs are the people with their bacon in the game performing the actual work, and the chickens are the people to whom the results are being delivered. Most commonly, the pigs are IT or the technical team, and the chickens are the users or the business team. But these scrum lines are drawn in the sand, and therefore easily crossed.

Many organizations love the idea of agile BI because they are thinking like chickens and not like pigs. And the agile life is always easier for the chicken because they are only involved, whereas the pig is committed.

OADD organizations often “hurl themselves toward agile BI” because they’re enamored with the theory, but unrealistic about what the practice truly requires. They’re all-in when it comes to the planning, but bacon-less when it comes to the execution.

This is one common way that OADD organizations can get Scrum Screwed Up—they are All Chicken and No Pig.

Scrum Screwed Up: All Pig and No Chicken

Closer to the point being made in Jill’s blog post, IT can pretend to be pigs making seemingly impressive progress, but although they’re bringing home the bacon, it lacks any real sizzle because it’s not delivering any real advanced analytics to business users.

Although they appear to be scrumming, IT is really just screwing around with technology, albeit in an agile manner. However, what good is “delivering data at the speed of thought” when that data is neither what the business is thinking, nor truly needs?

This is another common way that OADD organizations can get Scrum Screwed Up—they are All Pig and No Chicken.

Scrum is NOT a Silver Bullet

Scrum—and any other agile framework—is not a silver bullet. However, agile methodologies can work—and not just for BI.

But whether you want to call it Chicken-Pig Collaboration, or Business-IT Collaboration, or Shiny Happy People Holding Hands, a true enterprise-wide collaboration facilitated by a cross-disciplinary team is necessary for any success—agile or otherwise.

Agile frameworks, when implemented properly, help organizations realistically embrace complexity and avoid oversimplification, by leveraging recurring iterations of relatively short duration that always deliver data-driven solutions to business problems.

Agile frameworks are successful when people take on the challenge united by collaboration, guided by effective methodology, and supported by enabling technology. Agile frameworks allow the enterprise to follow what works, for as long as it works, and without being afraid to adjust as necessary when circumstances inevitably change.

For more information about Agile BI, follow Jill Dyché and TDWI World Conference in San Diego, August 15-20 via Twitter.

June 24, 2010

MacGyver: Data Governance and Duct Tape

June 24, 2010/ Jim Harris

One of my favorite 1980s television shows was MacGyver, which starred Richard Dean Anderson as an extremely intelligent and endlessly resourceful secret agent, known for his practical application of scientific knowledge and inventive use of common items.

While I was thinking about the role of both data stewards and data cleansing within a successful data governance program, the two things that immediately came to mind were MacGyver, and the other equally versatile metaphor for versatility—duct tape.

I decided to combine these two excellent metaphors by envisioning MacGyver as a data steward and duct tape as data cleansing.

Data Steward: The MacGyver of Data Governance

Since “always prepared for adventure” was one of the show’s taglines, I think MacGyver would make an excellent data steward.

The fact that the activities associated with the role can vary greatly, almost qualifies “data steward” as a MacGyverism. Your particular circumstances, and especially the unique corporate culture of your organization, will determine the responsibilities of your data stewardship function, but the general principles of data stewardship, as defined by Jill Dyché, include the following:

Stewardship is the practice of managing or looking after the well being of something.
Data is an asset owned by the enterprise.
Data stewards do not necessarily “own” the data assigned to them.
Data stewards care for data assets on behalf of the enterprise.

Just like MacGyver’s trusted sidekick—his Swiss Army knife—the most common trait of a data steward may be versatility.

I am not suggesting that a data steward is a jack of all trades, but master of none. However, a data steward often has a rather HedgeFoxian personality, thereby possessing the versatility necessary to integrate disparate disciplines into practical solutions.

In her excellent article Data Stewardship Strategy, Jill Dyché outlined six tried-and-true techniques that can help you avoid some common mistakes and successfully establish a data stewardship function within your organization. The second technique provides a few examples of typical data stewardship activities, which often include assessing and correcting data quality issues.

Data Cleansing: The Duct Tape of Data Quality

About poor data quality, MacGyver says, “if I had some duct tape, I could fix that.” (Okay—so he says that about everything.)

Data cleansing is the duct tape of data quality.

Proactive defect prevention is highly recommended, even though it is impossible to truly prevent every problem before it happens, because the more control enforced where data originates, the better the overall quality will be for enterprise information.

However, when poor data quality negatively impacts decision-critical information, the organization may legitimately prioritize a reactive short-term response—where the only remediation will be finding and fixing the immediate problems.

Of course, remediation limited to data cleansing alone will neither identify nor address the burning root cause of those problems.

Effectively balancing the demands of a triage mentality with a best practice of implementing defect prevention wherever possible, will often create a very challenging situation for data stewards to contend with on a daily basis. However, like MacGyver says:

“When it comes down to me against a situation, I don’t like the situation to win.”

Therefore, although comprehensive data remediation will require combining reactive and proactive approaches to data quality, data stewards need to always keep plenty of duct tape on hand (i.e., put data cleansing tools to good use whenever necessary).

The Data Governance Foundation

In the television series, MacGyver eventually left the clandestine service and went to work for the Phoenix Foundation.

Similarly, in the world of data quality, many data stewards don’t formally receive that specific title until they go to work helping to establish your organization’s overall Data Governance Foundation.

Although it may be what the function is initially known for, as Jill Dyché explains, “data stewardship is bigger than data quality.”

“Data stewards establish themselves as adept at executing new data governance policies and consequently, vital to ongoing information management, they become ambassadors on data’s behalf, proselytizing the concept of data as a corporate asset.”

Of course, you must remember that many of the specifics of the data stewardship function will be determined by your unique corporate culture and where your organization currently is in terms of its overall data governance maturity.

Although not an easy mission to undertake, the evolving role of a data steward is of vital importance to data governance.

The primary focus of data governance is the strategic alignment of people throughout the organization through the definition, and enforcement, of policies in relation to data access, data sharing, data quality, and effective data usage, all for the purposes of supporting critical business decisions and enabling optimal business performance.

I know that sounds like a daunting challenge (and it definitely is) but always remember the wise words of Angus MacGyver:

“Brace yourself. This could be fun.”

The Prince of Data Governance

Jack Bauer and Enforcing Data Governance Policies

The Circle of Quality

A Tale of Two Q’s

Live-Tweeting: Data Governance

Follow OCDQ

If you enjoyed this blog post, then please subscribe to OCDQ via my RSS feed, my E-mail updates, or Google Reader.

You can also follow OCDQ on Twitter, fan the Facebook page for OCDQ, and connect with me on LinkedIn.

December 04, 2009

Live-Tweeting: Data Governance

December 04, 2009/ Jim Harris

The term “live-tweeting” describes using Twitter to provide near real-time reporting from an event. I live-tweet from the sessions I attend at industry conferences as well as interesting webinars.

Recently, I live-tweeted Successful Data Stewardship Through Data Governance, which was a data governance webinar featuring Marty Moseley of Initiate Systems and Jill Dyché of Baseline Consulting.

Instead of writing a blog post summarizing the webinar, I thought I would list my tweets with brief commentary. My goal is to provide an example of this particular use of Twitter so you can decide its value for yourself.

As the webinar begins, Marty Moseley and Jill Dyché provide some initial thoughts on data governance:

Jill Dyché provides a great list of data governance myths and facts:

Jill Dyché provides some data stewardship insights:

As the webinar ends, Marty Moseley and Jill Dyché provide some closing thoughts about data governance and data quality:

Please Share Your Thoughts

If you attended the webinar, then you know additional material was presented. Did my tweets do the webinar justice? Did you follow along on Twitter during the webinar? If you did not attend the webinar, then are these tweets helpful?

What are your thoughts in general regarding the pros and cons of live-tweeting?

The following three blog posts are conference reports based largely on my live-tweets from the events:

Enterprise Data World 2009

TDWI World Conference Chicago 2009

DataFlux IDEAS 2009

November 03, 2009

Customer Incognita

November 03, 2009/ Jim Harris

Many enterprise information initiatives are launched in order to unravel that riddle, wrapped in a mystery, inside an enigma, that great unknown, also known as...Customer.

Centuries ago, cartographers used the Latin phrase terra incognita (meaning “unknown land”) to mark regions on a map not yet fully explored. In this century, companies simply can not afford to use the phrase customer incognita to indicate what information about their existing (and prospective) customers they don't currently have or don't properly understand.

What is a Customer?

First things first, what exactly is a customer? Those happy people who give you money? Those angry people who yell at you on the phone or say really mean things about your company on Twitter and Facebook? Why do they have to be so mean?

Mean people suck. However, companies who don't understand their customers also suck. And surely you don't want to be one of those companies, do you? I didn't think so.

Getting back to the question, here are some insights from the Data Quality Pro discussion forum topic What is a customer?:

Someone who purchases products or services from you. The word “someone” is key because it’s not the role of a “customer” that forms the real problem, but the precision of the term “someone” that causes challenges when we try to link other and more specific roles to that “someone.” These other roles could be contract partner, payer, receiver, user, owner, etc.
Customer is a role assigned to a legal entity in a complete and precise picture of the real world. The role is established when the first purchase is accepted from this real-world entity. Of course, the main challenge is whether or not the company can establish and maintain a complete and precise picture of the real world.

These working definitions were provided by fellow blogger and data quality expert Henrik Liliendahl Sørensen, who recently posted 360° Business Partner View, which further examines the many different ways a real-world entity can be represented, including when, instead of a customer, the real-world entity represents a citizen, patient, member, etc.

A critical first step for your company is to develop your definition of a customer. Don't underestimate either the importance or the difficulty of this process. And don't assume it is simply a matter of semantics.

Some of my consulting clients have indignantly told me: “We don't need to define it, everyone in our company knows exactly what a customer is.” I usually respond: “I have no doubt that everyone in your company uses the word customer, however I will work for free if everyone defines the word customer in exactly the same way.” So far, I haven't had to work for free.

How Many Customers Do You Have?

You have done the due diligence and developed your definition of a customer. Excellent! Nice work. Your next challenge is determining how many customers you have. Hopefully, you are not going to try using any of these techniques:

SELECT COUNT(*) AS "We have this many customers" FROM Customers
SELECT COUNT(DISTINCT Name) AS "No wait, we really have this many customers" FROM Customers
Middle-Square or Blum Blum Shub methods (i.e. random number generation)
Magic 8-Ball says: “Ask again later”

One of the most common and challenging data quality problems is the identification of duplicate records, especially redundant representations of the same customer information within and across systems throughout the enterprise. The need for a solution to this specific problem is one of the primary reasons that companies invest in data quality software and services.

Earlier this year on Data Quality Pro, I published a five part series of articles on identifying duplicate customers, which focused on the methodology for defining your business rules and illustrated some of the common data matching challenges.

Topics covered in the series:

Why a symbiosis of technology and methodology is necessary when approaching this challenge
How performing a preliminary analysis on a representative sample of real data prepares effective examples for discussion
Why using a detailed, interrogative analysis of those examples is imperative for defining your business rules
How both false negatives and false positives illustrate the highly subjective nature of this problem
How to document your business rules for identifying duplicate customers
How to set realistic expectations about application development
How to foster a collaboration of the business and technical teams throughout the entire project
How to consolidate identified duplicates by creating a “best of breed” representative record

To read the series, please follow these links:

To download the associated presentation (no registration required), please follow this link: OCDQ Downloads

Conclusion

“Knowing the characteristics of your customers,” stated Jill Dyché and Evan Levy in the opening chapter of their excellent book, Customer Data Integration: Reaching a Single Version of the Truth, “who they are, where they are, how they interact with your company, and how to support them, can shape every aspect of your company's strategy and operations. In the information age, there are fewer excuses for ignorance.”

For companies of every size and within every industry, customer incognita is a crippling condition that must be replaced with customer cognizance in order for the company to continue to remain competitive in a rapidly changing marketplace.

Do you know your customers? If not, then they likely aren't your customers anymore.

October 01, 2009

Poor Data Quality is a Virus

October 01, 2009/ Jim Harris

“A storm is brewing—a perfect storm of viral data, disinformation, and misinformation.”

These cautionary words (written by Timothy G. Davis, an Executive Director within the IBM Software Group) are from the foreword of the remarkable new book Viral Data in SOA: An Enterprise Pandemic by Neal A. Fishman.

“Viral data,” explains Fishman, “is a metaphor used to indicate that business-oriented data can exhibit qualities of a specific type of human pathogen: the virus. Like a virus, data by itself is inert. Data requires software (or people) for the data to appear alive (or actionable) and cause a positive, neutral, or negative effect.”

“Viral data is a perfect storm,” because as Fishman explains, it is “a perfect opportunity to miscommunicate with ubiquity and simultaneity—a service-oriented pandemic reaching all corners of the enterprise.”

“The antonym of viral data is trusted information.”

Data Quality

“Quality is a subjective term,” explains Fishman, “for which each person has his or her own definition.” Fishman goes on to quote from many of the published definitions of data quality, including a few of my personal favorites:

David Loshin: “Fitness for use—the level of data quality determined by data consumers in terms of meeting or beating expectations.”
Danette McGilvray: “The degree to which information and data can be a trusted source for any and/or all required uses. It is having the right set of correct information, at the right time, in the right place, for the right people to use to make decisions, to run the business, to serve customers, and to achieve company goals.”
Thomas Redman: “Data are of high quality if those who use them say so. Usually, high-quality data must be both free of defects and possess features that customers desire.”

Data quality standards provide a highest common denominator to be used by all business units throughout the enterprise as an objective data foundation for their operational, tactical, and strategic initiatives. Starting from this foundation, information quality standards are customized to meet the subjective needs of each business unit and initiative. This approach leverages a consistent enterprise understanding of data while also providing the information necessary for day-to-day operations.

However, the enterprise-wide data quality standards must be understood as dynamic. Therefore, enforcing strict conformance to data quality standards can be self-defeating. On this point, Fishman quotes Joseph Juran: “conformance by its nature relates to static standards and specification, whereas quality is a moving target.”

Defining data quality is both an essential and challenging exercise for every enterprise. “While a succinct and holistic single-sentence definition of data quality may be difficult to craft,” explains Fishman, “an axiom that appears to be generally forgotten when establishing a definition is that in business, data is about things that transpire during the course of conducting business. Business data is data about the business, and any data about the business is metadata. First and foremost, the definition as to the quality of data must reflect the real-world object, concept, or event to which the data is supposed to be directly associated.”

Data Governance

“Data governance can be used as an overloaded term,” explains Fishman, and he quotes Jill Dyché and Evan Levy to explain that “many people confuse data quality, data governance, and master data management.”

“The function of data governance,” explains Fishman, “should be distinct and distinguishable from normal work activities.”

For example, although knowledge workers and subject matter experts are necessary to define the business rules for preventing viral data, according to Fishman, these are data quality tasks and not acts of data governance.

However, these data quality tasks must “subsequently be governed to make sure that all the requisite outcomes comply with the appropriate controls.”

Therefore, according to Fishman, “data governance is a function that can act as an oversight mechanism and can be used to enforce controls over data quality and master data management, but also over data privacy, data security, identity management, risk management, or be accepted in the interpretation and adoption of regulatory requirements.”

Conclusion

“There is a line between trustworthy information and viral data,” explains Fishman, “and that line is very fine.”

Poor data quality is a viral contaminant that will undermine the operational, tactical, and strategic initiatives essential to the enterprise's mission to survive and thrive in today's highly competitive and rapidly evolving marketplace.

Left untreated or unchecked, this infectious agent will negatively impact the quality of business decisions. As the pathogen replicates, more and more decision-critical enterprise information will be compromised.

According to Fishman, enterprise data quality requires a multidisciplinary effort and a lifetime commitment to:

“Prevent viral data and preserve trusted information.”

Books Referenced in this Post

Viral Data in SOA: An Enterprise Pandemic by Neal A. Fishman

Enterprise Knowledge Management: The Data Quality Approach by David Loshin

Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information by Danette McGilvray

Data Quality: The Field Guide by Thomas Redman

Juran on Quality by Design: The New Steps for Planning Quality into Goods and Services by Joseph Juran

Customer Data Integration: Reaching a Single Version of the Truth by Jill Dyché and Evan Levy

DQ-Tip: “Don't pass bad data on to the next person...”

The Only Thing Necessary for Poor Data Quality

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

Data Governance and Data Quality

May 09, 2009

TDWI World Conference Chicago 2009

May 09, 2009/ Jim Harris

Founded in 1995, TDWI (The Data Warehousing Institute™) is the premier educational institute for business intelligence and data warehousing that provides education, training, certification, news, and research for executives and information technology professionals worldwide. TDWI conferences always offer a variety of full-day and half-day courses taught in an objective, vendor-neutral manner. The courses taught are designed for professionals and taught by in-the-trenches practitioners who are well known in the industry.

TDWI World Conference Chicago 2009 was held May 3-8 in Chicago, Illinois at the Hyatt Regency Hotel and was a tremendous success. I attended as a Data Quality Journalist for the International Association for Information and Data Quality (IAIDQ).

I used Twitter to provide live reporting from the conference. Here are my notes from the courses I attended:

BI from Both Sides: Aligning Business and IT

Jill Dyché, CBIP, is a partner and co-founder of Baseline Consulting, a management and technology consulting firm that provides data integration and business analytics services. Jill is responsible for delivering industry and client advisory services, is a frequent lecturer and writer on the business value of IT, and writes the excellent Inside the Biz blog. She is the author of acclaimed books on the business value of information: e-Data: Turning Data Into Information With Data Warehousing and The CRM Handbook: A Business Guide to Customer Relationship Management. Her latest book, written with Evan Levy, is Customer Data Integration: Reaching a Single Version of the Truth.

Course Quotes from Jill Dyché:

Five Critical Success Factors for Business Intelligence (BI):
1. Organization - Build organizational structures and skills to foster a sustainable program
2. Processes - Align both business and IT development processes that facilitate delivery of ongoing business value
3. Technology - Select and build technologies that deploy information cost-effectively
4. Strategy - Align information solutions to the company's strategic goals and objectives
5. Information - Treat data as an asset by separating data management from technology implementation
Three Different Requirement Categories:
1. What is the business need, pain, or problem? What business questions do we need to answer?
2. What data is necessary to answer those business questions?
3. How do we need to use the resulting information to answer those business questions?
“Data warehouses are used to make business decisions based on data – so data quality is critical”
“Even companies with mature enterprise data warehouses still have data silos - each business area has its own data mart”
“Instead of pushing a business intelligence tool, just try to get people to start using data”
“Deliver a usable system that is valuable to the business and not just a big box full of data”

TDWI Data Governance Summit

Philip Russom is the Senior Manager of Research and Services at TDWI, where he oversees many of TDWI’s research-oriented publications, services, and events. Prior to joining TDWI in 2005, he was an industry analyst covering BI at Forrester Research, as well as a contributing editor with Intelligent Enterprise and Information Management (formerly DM Review) magazines.

Summit Quotes from Philip Russom:

“Data Governance usually boils down to some form of control for data and its usage”
“Four Ps of Data Governance: People, Policies, Procedures, Process”
“Three Pillars of Data Governance: Compliance, Business Transformation, Business Integration”
“Two Foundations of Data Governance: Business Initiatives and Data Management Practices”
“Cross-functional collaboration is a requirement for successful Data Governance”

Becky Briggs, CBIP, CMQ/OE, is a Senior Manager and Data Steward for Airlines Reporting Corporation (ARC) and has 25 years of experience in data processing and IT - the last 9 in data warehousing and BI. She leads the program team responsible for product, project, and quality management, business line performance management, and data governance/stewardship.

Summit Quotes from Becky Briggs:

“Data Governance is the act of managing the organization's data assets in a way that promotes business value, integrity, usability, security and consistency across the company”
Five Steps of Data Governance:
1. Determine what data is required
2. Evaluate potential data sources (internal and external)
3. Perform data profiling and analysis on data sources
4. Data Services - Definition, modeling, mapping, quality, integration, monitoring
5. Data Stewardship - Classification, access requirements, archiving guidelines
“You must realize and accept that Data Governance is a program and not just a project”

Barbara Shelby is a Senior Software Engineer for IBM with over 25 years of experience holding positions of technical specialist, consultant, and line management. Her global management and leadership positions encompassed network authentication, authorization application development, corporate business systems data architecture, and database development.

Summit Quotes from Barbara Shelby:

Four Common Barriers to Data Governance:
1. Information - Existence of information silos and inconsistent data meanings
2. Organization - Lack of end-to-end data ownership and organization cultural challenges
3. Skill - Difficulty shifting resources from operational to transformational initiatives
4. Technology - Business data locked in large applications and slow deployment of new technology
Four Key Decision Making Bodies for Data Governance:
1. Enterprise Integration Team - Oversees the execution of CIO funded cross enterprise initiatives
2. Integrated Enterprise Assessment - Responsible for the success of transformational initiatives
3. Integrated Portfolio Management Team - Responsible for making ongoing business investment decisions
4. Unit Architecture Review - Responsible for the IT architecture compliance of business unit solutions

Lee Doss is a Senior IT Architect for IBM with over 25 years of information technology experience. He has a patent for process of aligning strategic capability for business transformation and he has held various positions including strategy, design, development, and customer support for IBM networking software products.

Summit Quotes from Lee Doss:

Five Data Governance Best Practices:
1. Create a sense of urgency that the organization can rally around
2. Start small, grow fast...pick a few visible areas to set an example
3. Sunset legacy systems (application, data, tools) as new ones are deployed
4. Recognize the importance of organization culture…this will make or break you
5. Always, always, always – Listen to your customers

Kevin Kramer is a Senior Vice President and Director of Enterprise Sales for UMB Bank and is responsible for development of sales strategy, sales tool development, and implementation of enterprise-wide sales initiatives.

Summit Quotes from Kevin Kramer:

“Without Data Governance, multiple sources of customer information can produce multiple versions of the truth”
“Data Governance helps break down organizational silos and shares customer data as an enterprise asset”
“Data Governance provides a roadmap that translates into best practices throughout the entire enterprise”

Kanon Cozad is a Senior Vice President and Director of Application Development for UMB Bank and is responsible for overall technical architecture strategy and oversees information integration activities.

Summit Quotes from Kanon Cozad:

“Data Governance identifies business process priorities and then translates them into enabling technology”
“Data Governance provides direction and Data Stewardship puts direction into action”
“Data Stewardship identifies and prioritizes applications and data for consolidation and improvement”

Summit Quotes from Jill Dyché:

“The hard part of Data Governance is the data”
“No data will be formally sanctioned unless it meets a business need”
“Data Governance focuses on policies and strategic alignment”
“Data Management focuses on translating defined polices into executable actions”
“Entrench Data Governance in the development environment”
“Everything is customer data – even product and financial data”

Data Quality Assessment - Practical Skills

Arkady Maydanchik is a co-founder of Data Quality Group, a recognized practitioner, author, and educator in the field of data quality and information integration. Arkady's data quality methodology and breakthrough ARKISTRA technology were used to provide services to numerous organizations. Arkady is the author of the excellent book Data Quality Assessment, a frequent speaker at various conferences and seminars, and a contributor to many journals and online publications. Data quality curriculum by Arkady Maydanchik can be found at eLearningCurve.

Course Quotes from Arkady Maydanchik:

“Nothing is worse for data quality than desperately trying to fix it during the last few weeks of an ETL project”
“Quality of data after conversion is in direct correlation with the amount of knowledge about actual data”
“Data profiling tools do not do data profiling - it is done by data analysts using data profiling tools”
“Data Profiling does not answer any questions - it helps us ask meaningful questions”
“Data quality is measured by its fitness to the purpose of use – it's essential to understand how data is used”
“When data has multiple uses, there must be data quality rules for each specific use”
“Effective root cause analysis requires not stopping after the answer to your first question - Keep asking: Why?”
“The central product of a Data Quality Assessment is the Data Quality Scorecard”
“Data quality scores must be both meaningful to a specific data use and be actionable”
“Data quality scores must estimate both the cost of bad data and the ROI of data quality initiatives”

Modern Data Quality Techniques in Action - A Demonstration Using Human Resources Data

Gian Di Loreto formed Loreto Services and Technologies in 2004 from the client services division of Arkidata Corporation. Loreto Services provides data cleansing and integration consulting services to Fortune 500 companies. Gian is a classically trained scientist - he received his PhD in elementary particle physics from Michigan State University.

Course Quotes from Gian Di Loreto:

“Data Quality is rich with theory and concepts – however it is not an academic exercise, it has real business impact”
“To do data quality well, you must walk away from the computer and go talk with the people using the data”
“Undertaking a data quality initiative demands developing a deeper knowledge of the data and the business”
“Some essential data quality rules are ‘hidden’ and can only be discovered by ‘clicking around’ in the data”
“Data quality projects are not about systems working together - they are about people working together”
“Sometimes, data quality can be ‘good enough’ for source systems but not when integrated with other systems”
“Unfortunately, no one seems to care about bad data until they have it”
“Data quality projects are only successful when you understand the problem before trying to solve it”

Mark Your Calendar

TDWI World Conference San Diego 2009 - August 2-7, 2009.

TDWI World Conference Orlando 2009 - November 1-6, 2009.

TDWI World Conference Las Vegas 2010 - February 21-26, 2010.

OCDQ Blog

Popular OCDQ Radio Episodes

Related Posts

A Practical Guide to Analytics

Data Governance for BI Professionals

Related Posts

A Contrarian’s View of Agile BI

Scrum Screwed Up: All Chicken and No Pig

Scrum Screwed Up: All Pig and No Chicken

Scrum is NOT a Silver Bullet

Data Steward: The MacGyver of Data Governance

Data Cleansing: The Duct Tape of Data Quality

The Data Governance Foundation

Related Posts

Follow OCDQ

Please Share Your Thoughts

Related Posts

What is a Customer?

How Many Customers Do You Have?

Conclusion

Data Quality

Data Governance

Conclusion

Books Referenced in this Post

Related Posts

BI from Both Sides: Aligning Business and IT

TDWI Data Governance Summit

Data Quality Assessment - Practical Skills

Modern Data Quality Techniques in Action - A Demonstration Using Human Resources Data

Mark Your Calendar

OCDQ Blog