September 20, 2010

DQ-Tip: “There is no such thing as data accuracy...”

September 20, 2010/ Jim Harris

Data Quality (DQ) Tips is an OCDQ regular segment. Each DQ-Tip is a clear and concise data quality pearl of wisdom.

“There is no such thing as data accuracy — There are only assertions of data accuracy.”

This DQ-Tip came from the Data Quality Pro webinar ISO 8000 Master Data Quality featuring Peter Benson of ECCMA.

You can download (.pdf file) quotes from this webinar by clicking on this link: Data Quality Pro Webinar Quotes - Peter Benson

ISO 8000 is the international standards for data quality. You can get more information by clicking on this link: ISO 8000

Data Accuracy

Accuracy, which, thanks to substantial assistance from my readers, was defined in a previous post as both the correctness of a data value within a limited context such as verification by an authoritative reference (i.e., validity) combined with the correctness of a valid data value within an extensive context including other data as well as business processes (i.e., accuracy).

“The definition of data quality,” according to Peter and the ISO 8000 standards, “is the ability of the data to meet requirements.”

Although accuracy is only one of many dimensions of data quality, whenever we refer to data as accurate, we are referring to the ability of the data to meet specific requirements, and quite often it’s the ability to support making a critical business decision.

I agree with Peter and the ISO 8000 standards because we can’t simply take an accuracy metric on a data quality dashboard (or however else the assertion is presented to us) at face value without understanding how the metric is both defined and measured.

However, even when well defined and properly measured, data accuracy is still only an assertion. Oftentimes, the only way to verify the assertion is by putting the data to its intended use.

If by using it you discover that the data is inaccurate, then by having established what the assertion of accuracy was based on, you have a head start on performing root cause analysis, enabling faster resolution of the issues—not only with the data, but also with the business and technical processes used to define and measure data accuracy.

August 14, 2010

Scrum Screwed Up

August 14, 2010/ Jim Harris

This was the inaugural cartoon on Implementing Scrum by Michael Vizdos and Tony Clark, which does a great job of illustrating the fable of The Chicken and the Pig used to describe the two types of roles involved in Scrum, which, quite rare for our industry, is not an acronym, but one common approach among many iterative, incremental frameworks for agile software development.

Scrum is also sometimes used as a generic synonym for any agile framework. Although I’m not an expert, I’ve worked on more than a few agile programs. And since I am fond of metaphors, I will use the Chicken and the Pig to describe two common ways that scrums of all kinds can easily get screwed up:

All Chicken and No Pig
All Pig and No Chicken

However, let’s first establish a more specific context for agile development using one provided by a recent blog post on the topic.

A Contrarian’s View of Agile BI

In her excellent blog post A Contrarian’s View of Agile BI, Jill Dyché took a somewhat unpopular view of a popular view, which is something that Jill excels at—not simply for the sake of doing it—because she’s always been well-known for telling it like it is.

In preparation for the upcoming TDWI World Conference in San Diego, Jill was pondering the utilization of agile methodologies in business intelligence (aka BI—ah, there’s one of those oh so common industry acronyms straight out of The Acronymicon).

The provocative TDWI conference theme is: “Creating an Agile BI Environment—Delivering Data at the Speed of Thought.”

Now, please don’t misunderstand. Jill is an advocate for doing agile BI the right way. And it’s certainly understandable why so many organizations love the idea of agile BI. Especially when you consider the slower time to value of most other approaches when compared with, following Jill’s rule of thumb, how agile BI would have “either new BI functionality or new data deployed (at least) every 60-90 days. This approach establishes BI as a program, greater than the sum of its parts.”

“But in my experience,” Jill explained, “if the organization embracing agile BI never had established BI development processes in the first place, agile BI can be a road to nowhere. In fact, the dirty little secret of agile BI is this: It’s companies that don’t have the discipline to enforce BI development rigor in the first place that hurl themselves toward agile BI.”

“Peek under the covers of an agile BI shop,” Jill continued, “and you’ll often find dozens or even hundreds of repeatable canned BI reports, but nary an advanced analytics capability. You’ll probably discover an IT organization that failed to cultivate solid relationships with business users and is now hiding behind an agile vocabulary to justify its own organizational ADD. It’s lack of accountability, failure to manage a deliberate pipeline, and shifting work priorities packaged up as so much scrum.”

I really love the term Organizational Attention Deficit Disorder, and in spite of myself, I can’t help but render it acronymically as OADD—which should be pronounced as “odd” because the “a” is silent, as in: “Our organization is really quite OADD, isn’t it?”

Scrum Screwed Up: All Chicken and No Pig

Returning to the metaphor of the Scrum roles, the pigs are the people with their bacon in the game performing the actual work, and the chickens are the people to whom the results are being delivered. Most commonly, the pigs are IT or the technical team, and the chickens are the users or the business team. But these scrum lines are drawn in the sand, and therefore easily crossed.

Many organizations love the idea of agile BI because they are thinking like chickens and not like pigs. And the agile life is always easier for the chicken because they are only involved, whereas the pig is committed.

OADD organizations often “hurl themselves toward agile BI” because they’re enamored with the theory, but unrealistic about what the practice truly requires. They’re all-in when it comes to the planning, but bacon-less when it comes to the execution.

This is one common way that OADD organizations can get Scrum Screwed Up—they are All Chicken and No Pig.

Scrum Screwed Up: All Pig and No Chicken

Closer to the point being made in Jill’s blog post, IT can pretend to be pigs making seemingly impressive progress, but although they’re bringing home the bacon, it lacks any real sizzle because it’s not delivering any real advanced analytics to business users.

Although they appear to be scrumming, IT is really just screwing around with technology, albeit in an agile manner. However, what good is “delivering data at the speed of thought” when that data is neither what the business is thinking, nor truly needs?

This is another common way that OADD organizations can get Scrum Screwed Up—they are All Pig and No Chicken.

Scrum is NOT a Silver Bullet

Scrum—and any other agile framework—is not a silver bullet. However, agile methodologies can work—and not just for BI.

But whether you want to call it Chicken-Pig Collaboration, or Business-IT Collaboration, or Shiny Happy People Holding Hands, a true enterprise-wide collaboration facilitated by a cross-disciplinary team is necessary for any success—agile or otherwise.

Agile frameworks, when implemented properly, help organizations realistically embrace complexity and avoid oversimplification, by leveraging recurring iterations of relatively short duration that always deliver data-driven solutions to business problems.

Agile frameworks are successful when people take on the challenge united by collaboration, guided by effective methodology, and supported by enabling technology. Agile frameworks allow the enterprise to follow what works, for as long as it works, and without being afraid to adjust as necessary when circumstances inevitably change.

For more information about Agile BI, follow Jill Dyché and TDWI World Conference in San Diego, August 15-20 via Twitter.

August 10, 2010

Which came first, the Data Quality Tool or the Business Need?

August 10, 2010/ Jim Harris

This recent tweet by Andy Bitterer of Gartner Research (and ANALYSTerical) sparked an interesting online discussion, which was vaguely reminiscent of the classic causality dilemma that is commonly stated as “which came first, the chicken or the egg?”

An E-mail from the Edge

On the same day I saw Andy’s tweet, I received an e-mail from a friend and fellow data quality consultant, who had just finished a master data management (MDM) and enterprise data warehouse (EDW) project, which had over 20 customer data sources.

Although he was brought onto the project specifically for data cleansing, he was told from the day of his arrival that because of time constraints, they decided against performing any data cleansing with their recently purchased data quality tool. Instead, they decided to use their data integration tool to simply perform the massive initial load into their new MDM hub and EDW.

But wait—the story gets even better. The very first decision this client made was to purchase a consolidated enterprise application development platform with seamlessly integrated components for data quality, data integration, and master data management.

So long before this client had determined their business need, they decided that they needed to build a new MDM hub and EDW, made a huge investment in an entire platform of technology, then decided to use only the basic data integration functionality.

However, this client was planning to use the real-time data quality and MDM services provided by their very powerful enterprise application development platform to prevent duplicates and any other bad data from entering the system after the initial load.

But, of course, no one on the project team was actually working on configuring any of those services, or even, for that matter, determining the business rules those services would enforce. Maybe the salesperson told them it was as easy as flipping a switch?

My friend (especially after looking at the data), preached data quality was a critical business need, but he couldn’t convince them, even despite taking the initiative to present the results of some quick data profiling, standardization, and data matching used to identify duplicate records within and across their primary data sources, which clearly demonstrated the level of poor data quality.

Although this client agreed that they definitely had some serious data issues, they still decided against doing any data cleansing and wanted to just get the data loaded. Maybe they thought they were loading the data into one of those self-healing databases?

The punchline—this client is a financial services institution with a business need to better identify their most valuable customers.

As my friend lamented at the end of his e-mail, why do clients often later ask why these types of projects fail?

Blind Vendor Allegiance

In his recent blog post Blind Vendor Allegiance Trumps Utility, Evan Levy examined this bizarrely common phenomenon of selecting a technology vendor without gathering requirements, reviewing product features, and then determining what tool(s) could best help build solutions for specific business problems—another example of the tool coming before the business need.

Evan was recounting his experiences at a major industry conference on MDM, where people were asking his advice on what MDM vendor to choose, despite admitting “we know we need MDM, but our company hasn’t really decided what MDM is.”

Furthermore, these prospective clients had decided to default their purchasing decision to the technology vendor they already do business with, in other words, “since we’re already a [you can just randomly insert the name of a large technology vendor here] shop, we just thought we’d buy their product—so what do you think of their product?”

“I find this type of question interesting and puzzling,” wrote Evan. “Why would anyone blindly purchase a product because of the vendor, rather than focusing on needs, priorities, and cost metrics? Unless a decision has absolutely no risk or cost, I’m not clear how identifying a vendor before identifying the requirements could possibly have a successful outcome.”

SaaS-y Data Quality on a Cloudy Business Day?

Emerging industry trends like open source, cloud computing, and software as a service (SaaS) are often touted as less expensive than traditional technology, and I have heard some use this angle to justify buying the tool before identifying the business need.

In his recent blog post Cloud Application versus On Premise, Myths and Realities, Michael Fauscette examined the return on investment (ROI) versus total cost of ownership (TCO) argument quite prevalent in the SaaS versus on premise software debate.

“Buying and implementing software to generate some necessary business value is a business decision, not a technology decision,” Michael concluded. “The type of technology needed to meet the business requirements comes after defining the business needs. Each delivery model has advantages and disadvantages financially, technically, and in the context of your business.”

So which came first, the Data Quality Tool or the Business Need?

This question is, of course, absurd because, in every rational theory, the business need should always come first. However, in predictably irrational real-world practice, it remains a classic causality dilemma for data quality related enterprise information initiatives such as data integration, master data management, data warehousing, business intelligence, and data governance.

But sometimes the data quality tool was purchased for an earlier project, and despite what some vendor salespeople may tell you, you don’t always need to buy new technology at the beginning of every new enterprise information initiative.

Whenever, and before defining your business need, you already have the technology in-house (or you have previously decided, often due to financial constraints, that you will need to build a bespoke solution), you still need to avoid technology bias.

Knowing how the technology works can sometimes cause a framing effect where your business need is defined in terms of the technology’s specific functionality, thereby framing the objective as a technical problem instead of a business problem.

Bottom line—your business problem should always be well-defined before any potential technology solution is evaluated.

There are no Magic Beans for Data Quality

Do you believe in Magic (Quadrants)?

Is your data complete and accurate, but useless to your business?

Can Enterprise-Class Solutions Ever Deliver ROI?

Selling the Business Benefits of Data Quality

The Circle of Quality

July 27, 2010

Is your data complete and accurate, but useless to your business?

July 27, 2010/ Jim Harris

Ensuring that complete and accurate data is being used to make critical daily business decisions is perhaps the primary reason why data quality is so vitally important to the success of your organization.

However, this effort can sometimes take on a life of its own, where achieving complete and accurate data is allowed to become the raison d'être of your data management strategy—in other words, you start managing data for the sake of managing data.

When this phantom menace clouds your judgment, your data might be complete and accurate—but useless to your business.

Completeness and Accuracy

How much data is necessary to make an effective business decision? Having complete (i.e., all available) data seems obviously preferable to incomplete data. However, with data volumes always burgeoning, the unavoidable fact is that sometimes having more data only adds confusion instead of clarity, thereby becoming a distraction instead of helping you make a better decision.

Returning to my original question, how much data is really necessary to make an effective business decision?

Although accurate data is obviously preferable to inaccurate data, less than perfect data quality can not be used as an excuse to delay making a critical business decision. When it comes to the quality of the data being used to make these business decisions, you can’t always get the data you want, but if you try sometimes, you just might find, you get the business insight you need.

Data-driven Solutions for Business Problems

Obviously, there are even more dimensions of data quality beyond completeness and accuracy.

However, although it’s about more than just improving your data, data quality can be misperceived to be an activity performed just for the sake of the data. When, in fact, data quality is an enterprise-wide initiative performed for the sake of implementing data-driven solutions for business problems, enabling better business decisions, and delivering optimal business performance.

In order to accomplish these objectives, data has to be not only complete and accurate, as well as whatever other dimensions you wish to add to your complete and accurate definition of data quality, but most important, data has to be useful to the business.

Perhaps the most common definition for data quality is “fitness for the purpose of use.”

The missing word, which makes this definition both incomplete and inaccurate, puns intended, is “business.” In other words, data quality is “fitness for the purpose of business use.” How complete and how accurate (and however else) the data needs to be is determined by its business use—or uses since, in the vast majority of cases, data has multiple business uses.

Data, data everywhere

With silos replicating data as well as new data being created daily, managing all of the data is not only becoming impractical, but because we are too busy with the activity of trying to manage all of it, no one is stopping to evaluate usage or business relevance.

The fifth of the Five New Ideas From 2010 MIT Information Quality Industry Symposium, which is a recent blog post written by Mark Goloboy, was that “60-90% of operational data is valueless.”

“I won’t say worthless,” Goloboy clarified, “since there is some operational necessity to the transactional systems that created it, but valueless from an analytic perspective. Data only has value, and is only worth passing through to the Data Warehouse if it can be directly used for analysis and reporting. No news on that front, but it’s been more of the focus since the proliferation of data has started an increasing trend in storage spend.”

In his recent blog post Are You Afraid to Say Goodbye to Your Data?, Dylan Jones discussed the critical importance of designing an archive strategy for data, as opposed to the default position many organizations take, where burgeoning data volumes are allowed to proliferate because, in large part, no one wants to delete (or, at the very least, archive) any of the existing data.

This often results in the data that the organization truly needs for continued success getting stuck in the long line of data waiting to be managed, and in many cases, behind data for which the organization no longer has any business use (and perhaps never even had the chance to use when the data was actually needed to make critical business decisions).

“When identifying data in scope for a migration,” Dylan advised, “I typically start from the premise that ALL data is out of scope unless someone can justify its existence. This forces the emphasis back on the business to justify their use of the data.”

Data Memorioso

Funes el memorioso is a short story by Jorge Luis Borges, which describes a young man named Ireneo Funes who, as a result of a horseback riding accident, has lost his ability to forget. Although Funes has a tremendous memory, he is so lost in the details of everything he knows that he is unable to convert the information into knowledge and unable, as a result, to grow in wisdom.

In Spanish, the word memorioso means “having a vast memory.” When Data Memorioso is your data management strategy, your organization becomes so lost in all of the data it manages that it is unable to convert data into business insight and unable, as a result, to survive and thrive in today’s highly competitive and rapidly evolving marketplace.

In their great book Made to Stick: Why Some Ideas Survive and Others Die, Chip Heath and Dan Heath explained that “an accurate but useless idea is still useless. If a message can’t be used to make predictions or decisions, it is without value, no matter how accurate or comprehensive it is.” I believe that this is also true for your data and your organization’s business uses for it.

Is your data complete and accurate, but useless to your business?

July 15, 2010

Data Quality and the Cupertino Effect

July 15, 2010/ Jim Harris

The Cupertino Effect can occur when you accept the suggestion of a spellchecker program, which was attempting to assist you with a misspelled word (or what it “thinks” is a misspelling because it cannot find an exact match for the word in its dictionary).

Although the suggestion (or in most cases, a list of possible words is suggested) is indeed spelled correctly, it might not be the word you were trying to spell, and in some cases, by accepting the suggestion, you create a contextually inappropriate result.

It’s called the “Cupertino” effect because with older programs the word “cooperation” was only listed in the spellchecking dictionary in hyphenated form (i.e., “co-operation”), making the spellchecker suggest “Cupertino” (i.e., the California city and home of the worldwide headquarters of Apple, Inc., thereby essentially guaranteeing it to be in all spellchecking dictionaries).

By accepting the suggestion of a spellchecker program (and if there’s only one suggested word listed, don’t we always accept it?), a sentence where we intended to write something like:

“Cooperation is vital to our mutual success.”

Becomes instead:

“Cupertino is vital to our mutual success.”

And then confusion ensues (or hilarity—or both).

Beyond being a data quality issue for unstructured data (e.g., documents, e-mail messages, blog posts, etc.), the Cupertino Effect reminded me of the accuracy versus context debate.

“Data quality is primarily about context not accuracy...”

This Data Quality (DQ) Tip from last September sparked a nice little debate in the comments section. The complete DQ-Tip was:

“Data quality is primarily about context not accuracy.

Accuracy is part of the equation, but only a very small portion.”

Therefore, the key point wasn’t that accuracy isn’t important, but simply to emphasize that context is more important.

In her fantastic book Executing Data Quality Projects, Danette McGilvray defines accuracy as “a measure of the correctness of the content of the data (which requires an authoritative source of reference to be identified and accessible).”

Returning to the Cupertino Effect for a moment, the spellchecking dictionary provides an identified, accessible, and somewhat authoritative source of reference—and “Cupertino” is correct data content for representing the name of a city in California.

However, absent a context within which to evaluate accuracy, how can we determine the correctness of the content of the data?

The Free-Form Effect

Let’s use a different example. A common root cause of poor quality for structured data is: free-form text fields.

Regardless of how good the metadata description is written or how well the user interface is designed, if a free-form text field is provided, then you will essentially be allowed to enter whatever you want for the content of the data (i.e., the data value).

For example, a free-form text field is provided for entering the Country associated with your postal address.

Therefore, you could enter data values such as:

Brazil
United States of America
Portugal
United States
República Federativa do Brasil
USA
Canada
Federative Republic of Brazil
Mexico
República Portuguesa
U.S.A.
Portuguese Republic

However, you could also enter data values such as:

Gondor
Gnarnia
Rohan
Citizen of the World
The Land of Oz
The Island of Sodor
Berzerkistan
Lilliput
Brobdingnag
Teletubbyland
Poketopia
Florin

The first list contains real countries, but a lack of standard values introduces needless variations. The second list contains fictional countries, which people like me enter into free-form fields to either prove a point or simply to amuse myself (well okay—both).

The most common solution is to provide a drop-down box of standard values, such as those provided by an identified, accessible, and authoritative source of reference—the ISO 3166 standard country codes.

Problem solved—right? Maybe—but maybe not.

Yes, I could now choose BR, US, PT, CA, MX (the ISO 3166 alpha-2 codes for Brazil, United States, Portugal, Canada, Mexico), which are the valid and standardized country code values for the countries from my first list above—and I would not be able to find any of my fictional countries listed in the new drop-down box.

However, I could also choose DO, RE, ME, FI, SO, LA, TT, DE (Dominican Republic, Réunion, Montenegro, Finland, Somalia, Lao People’s Democratic Republic, Trinidad and Tobago, Germany), all of which are valid and standardized country code values, however all of them are also contextually invalid for my postal address.

Accuracy: With or Without Context?

Accuracy is only one of the many dimensions of data quality—and you may have a completely different definition for it.

Paraphrasing Danette McGilvray, accuracy is a measure of the validity of data values, as verified by an authoritative reference.

My question is what about context? Or more specifically, should accuracy be defined as a measure of the validity of data values, as verified by an authoritative reference, and within a specific context?

Please note that I am only trying to define the accuracy dimension of data quality, and not data quality.

Therefore, please resist the urge to respond with “fitness for the purpose of use” since even if you want to argue that “context” is just another word meaning “use” then next we will have to argue over the meaning of the word “fitness” and before you know it, we will be arguing over the meaning of the word “meaning.”

Please accurately share your thoughts (with or without context) about accuracy and context—by posting a comment below.

July 05, 2010

The Diffusion of Data Governance

July 05, 2010/ Jim Harris

Marty Moseley of Initiate recently blogged Are We There Yet? Results of the Data Governance Survey, and the blog post includes a link to the survey, which is freely available—no registration required.

The Initiate survey says that although data governance dates back to the late 1980s, it is experiencing a resurgence because of initiatives such as business intelligence, data quality, and master data management—as well as the universal need to make better data-driven business decisions “in less time than ever before, often culling data from more structured and unstructured sources, with more transparency required.”

Winston Chen of Kalido recently blogged A Brief History of Data Governance, which provides a brief overview of three distinct eras in data management: Application Era (1960-1990), Enterprise Repository Era (1990-2010), and Policy Era (2010-?).

As I commented on Winston’s post, I began my career at the tail-end of the Application Era, and my career has been about a 50/50 split between applications and enterprise repositories since history does not move forward at the same pace for all organizations, including software vendors—by which, I mean that my professional experience was influenced more by working for vendors selling application-based solutions than it was by working with clients who were, let’s just say, less than progressive.

Diffusion of innovations (illustrated above) is a theory developed by Everett Rogers for describing the five stages and the rate at which innovations (e.g., new ideas or technology) spread through markets (or “cultures”), starting with the Innovators and the Early Adopters, then progressing through the Early Majority and the Late Majority, and finally ending with the Laggards.

Therefore, the exact starting points of the three eras Winston described in his post can easily be debated because progress can be painfully slow until a significant percentage of the Early Majority begins to embrace the innovation—thereby causing the so-called Tipping Point where progress begins to accelerate enough for the mainstream to take it seriously.

Please Note: I am not talking about crossing “The Chasm”—which as Geoffrey A. Moore rightfully discusses, is the critical, but much earlier, phenomenon occurring when enough of the Early Adopters have embraced the innovation so that the beginning of the Early Majority becomes an almost certainty—but true mainstream adoption of the innovation is still far from guaranteed.

The tipping point that I am describing occurs within the Early Majority and before the top of the adoption curve is reached.

Achieving 16% market share (or “cultural awareness”) is where the Early Majority begins—and only after successfully crossing the chasm (which I approximate occurs somewhere around 8% market share). However, the difference between a fad and a true innovation occurs somewhere around 25% market share—and this is the tipping point that I am describing.

The Late Majority (and the top of the adoption curve) doesn’t begin until 50% market share, and it’s all downhill from there, meaning that the necessary momentum has been achieved to almost guarantee that the innovation will be fully adopted.

For example, it could be argued that master data management (MDM) reached its tipping point in late 2009, and with the wave of acquisitions in early 2010, MDM stepped firmly on the gas pedal of the Early Majority, and we are perhaps just beginning to see the start of MDM’s Late Majority.

It is much harder to estimate where we are within the diffusion of data governance. Of course, corporate cultural awareness always plays a significant role in determining the adoption of new ideas and the market share of emerging technologies.

The Initiate survey concludes that “the state of data governance initiatives is still rather immature in most organizations” and reveals “a surprising lack of perceived executive interest in data governance initiatives.”

Rob Karel of Forrester Research recently blogged about how Data Governance Remains Immature, but he is “optimistic that we might finally see some real momentum building for data governance to be embraced as a legitimate competency.”

“It will likely be a number of years before best practices outnumber worst practices,” as Rob concludes, “but any momentum in data governance adoption is good momentum!”

From my perspective, data governance is still in the Early Adopter phase. Perhaps 2011 will be “The Year of Data Governance” in much the same way that some have declared 2010 to to be “The Year of MDM.”

In other words, it may be another six to twelve months before we can claim the Early Majority has truly embraced not just the idea of data governance, but have realistically begun their journey toward making it happen.

What Say You?

Please share your thoughts about the diffusion of data governance, as well as your overall perspectives on data governance.

MacGyver: Data Governance and Duct Tape

The Prince of Data Governance

Jack Bauer and Enforcing Data Governance Policies

Follow OCDQ

If you enjoyed this blog post, then please subscribe to OCDQ via my RSS feed, my E-mail updates, or Google Reader.

You can also follow OCDQ on Twitter, fan the Facebook page for OCDQ, and connect with me on LinkedIn.

July 01, 2010

Do you believe in Magic (Quadrants)?

July 01, 2010/ Jim Harris

Twitter

If you follow Data Quality on Twitter like I do, then you are probably already well aware that the 2010 Gartner Magic Quadrant for Data Quality Tools was released this week (surprisingly, it did not qualify as a Twitter trending topic).

The five vendors that were selected as the “data quality market leaders” were SAS DataFlux, IBM, Informatica, SAP Business Objects, and Trillium.

Disclosure: I am a former IBM employee, former IBM Information Champion, and I blog for the Data Roundtable, which is sponsored by SAS.

Please let me stress that I have the highest respect for both Ted Friedman and Andy Bitterer, as well as their in depth knowledge of the data quality industry and their insightful analysis of the market for data quality tools.

In this blog post, I simply want to encourage a good-natured debate, and not about the Gartner Magic Quadrant specifically, but rather about market research in general. Gartner is used as the example because they are perhaps the most well-known and the source most commonly cited by data quality vendors during the sales cycle—and obviously, especially by the “leading vendors.”

I would like to debate how much of an impact market research really has on a prospect’s decision to purchase a data quality tool.

Let’s agree to keep this to a very informal debate about how research can affect both the perception and the reality of the market.

Therefore—for the love of all high quality data everywhere—please, oh please, data quality vendors, do NOT send me your quarterly sales figures, or have your PR firm mercilessly spam either my comments section or my e-mail inbox with all the marketing collateral “proving” how Supercalifragilisticexpialidocious your data quality tool is—I said please, so play nice.

The OCDQ View on OOBE-DQ

In a previous post, I used the term OOBE-DQ to refer to the out-of-box-experience (OOBE) provided by data quality (DQ) tools, which usually becomes a debate between “ease of use” and “powerful functionality” after you ignore the Magic Beans sales pitch that guarantees you the data quality tool is both remarkably easy to use and incredibly powerful.

However, the data quality market continues to evolve away from esoteric technical tools and toward business-empowering suites providing robust functionality with easier to use and role-based interfaces that are tailored to the specific needs of different users, such as business analysts, data stewards, application developers, and system administrators.

The major players are still the large vendors who have innovated (mostly via acquisition and consolidation) enterprise application development platforms with integrated (to varying degrees) components, which provide not only data quality functionality, but also data integration and master data management (MDM) as well.

Many of these vendors also offer service-oriented deployments delivering the same functionality within more loosely coupled technical architectures, which includes leveraging real-time services to prevent (or at least greatly minimize) poor data quality at the multiple points of origin within the data ecosystem.

Many vendors are also beginning to provide better built-in reporting and data visualization capabilities, which is helping to make the correlation between poor data quality and suboptimal business processes more tangible, especially for executive management.

It must be noted that many vendors (including the “market leaders”) continue to struggle with their International OOBE-DQ.

Many (if not most) data quality tools are strongest in their native country or their native language, but their OOBE-DQ declines significantly when they travel abroad. Especially outside of the United States, smaller vendors with local linguistic and cultural expertise built into their data quality tools have continued to remain fiercely competitive with the larger vendors.

Market research certainly has a role to play in making a purchasing decision, and perhaps most notably as an aid in comparing and contrasting features and benefits, which of course, always have to be evaluated against your specific requirements, including both your current and future needs.

Now let’s shift our focus to examining some of the inherent challenges of evaluating market research, perception, and reality.

Confirmation Bias

First of all, I realize that this debate will suffer from a considerable—and completely understandable—confirmation bias.

If you are a customer, employee, or consultant for one of the “High Five” (not an “official” Gartner Magic Quadrant term for the Leaders), then obviously you have a vested interest in getting inebriated on your own Kool-Aid (as noted in my disclosure above, I used to get drunk on the yummy Big Blue Kool-Aid). Now, this doesn’t mean that you are a “yes man” (or a “yes woman”). It simply means it is logical for you to claim that market research, market perception, and market reality are in perfect alignment.

Likewise, if you are a customer, employee, or consultant for one of the “It Isn’t Easy Being Niche-y” (rather surprisingly, not an “official” Gartner Magic Quadrant term for the Niche Players), then obviously you have a somewhat vested interest in claiming that market research is from Mars, market perception is from Venus, and market reality is really no better than reality television.

And, if you are a customer, employee, or consultant for one of the “We are on the outside looking in, flipping both Gartner and their Magic Quadrant the bird for excluding us” (I think that you can figure out on your own whether or not that is an “official” Gartner Magic Quadrant term), then obviously you have a vested interest in saying that market research can “Kiss My ASCII!”

My only point is that your opinion of market research will obviously be influenced by what it says about your data quality tool.

Therefore, should it really surprise anyone when, during the sales cycle, one of the High Five uses the Truly Awesome Syllogism:

“Well, of course, we say our data quality tool is awesome.
However, the Gartner Magic Quadrant also says our data quality tool is awesome.
Therefore, our data quality tool is Truly Awesome.”

Okay, so technically, that’s not even a syllogism—but who said any form of logical argument is ever used during a sales cycle?

On a more serious note, and to stop having too much fun at Gartner’s expense, they do advise against simply selecting vendors in their “Leaders quadrant” and instead always advise to select the vendor that is the better match for your specific requirements.

Features and Benefits: The Game Nobody Wins

As noted earlier, a features and benefits comparison is not only the most common technique used by prospects, but it is also the most common—if not the only—way that the vendors themselves position their so-called “competitive differentiation.”

The problem with this approach—and not just for data quality tools—is that there are far more similarities than differences to be found when comparing features and benefits.

Practically every single data quality tool on the market today will include functionality for data profiling, data quality assessment, data standardization, data matching, data consolidation, data integration, data enrichment, and data quality monitoring.

Therefore, running down a checklist of features is like playing a game of Buzzword Bingo, or constantly playing Musical Chairs, but without removing any of the chairs in between rounds—in others words, the Features Game almost always ends in a tie.

So then next we play the Benefits Game, which is usually equally pointless because it comes down to silly arguments such as “our data matching engine is better than yours.” This is the data quality tool vendor equivalent of:

Vendor D: “My Dad can beat up your Dad!”

Vendor Q: “Nah-huh!”

Vendor D: “Yah-huh!”

Vendor Q: “NAH-HUH!”

Vendor D: “YAH-HUH!”

Vendor Q: “NAH-HUH!”

Vendor D: “Yah-huh! Stamp it! No Erasies! Quitsies!”

Vendor Q: “No fair! You can’t do that!”

After both vendors have returned from their “timeout,” a slightly more mature approach is to run a vendor “bake-off” where the dueling data quality tools participate in a head-to-head competition processing a copy of the same data provided by the prospect.

However, a bake-off often produces misleading results because the vendors—and not the prospect—perform the competition, making it mostly about vendor expertise, not OOBE-DQ. Also, the data used rarely exemplifies the prospect’s data challenges.

If competitive differentiation based on features and benefits is a game that nobody wins, then what is the alternative?

The Golden Circle

I recently read the book Start with Why by Simon Sinek, which explains that “people don’t buy WHAT you do, they buy WHY you do it.”

The illustration shows what Simon Sinek calls The Golden Circle.

WHY is your purpose—your driving motivation for action.

HOW is your principles—specific actions that are taken to realize your Why.

WHAT is your results—tangible ways in which you bring your Why to life.

It’s a circle when viewed from above, but in reality it forms a megaphone for broadcasting your message to the marketplace.

When you rely only on the approach of attempting to differentiate your data quality tool by discussing its features and benefits, you are focusing on only your WHAT, and absent your WHY and HOW, you sound just like everyone else to the marketplace.

When, as is often the case, nobody wins the Features and Benefits Game, a data quality tool sounds more like a commodity, which will focus the marketplace’s attention on aspects such as your price—and not on aspects such as your value.

Due to the considerable length of this blog post, I have been forced to greatly oversimplify the message of this book, which a future blog post will discuss in more detail. I highly recommend the book (and no, I am not an affiliate).

At the very least, consider this question:

If there truly was one data quality tool on the market today that, without question, had the very best features and benefits, then why wouldn’t everyone simply buy that one?

Of course your data quality tool has solid features and benefits—just like every other data quality tool does.

I believe that the hardest thing for our industry to accept is—the best technology hardly ever wins the sale.

As most of the best salespeople will tell you, what wins the sale is when a relationship is formed between vendor and customer, a strategic partnership built upon a solid foundation of rapport, respect, and trust.

And that has more to do with WHY you would make a great partner—and less to do with WHAT your data quality tool does.

Do you believe in Magic (Quadrants)?

I Want To Believe

How much of an impact do you think market research has on the purchasing decision of a data quality tool? How much do you think research affects both the perception and the reality of the data quality tool market? How much do you think the features and benefits of a data quality tool affect the purchasing decision?

All perspectives on this debate are welcome without bias. Therefore, please post a comment below.

PLEASE NOTE

Comments advertising your products and services (or bashing competitors) will not be approved.

June 26, 2010

Channeling My Inner Beagle: The Case for Hyperactivity

June 26, 2010/ Jim Harris

UnderDog

Phil Simon, who is a Bulldog’s best friend and is a good friend of mine, recently blogged Channeling My Inner Bulldog: The Case for Stubbornness, in which he described how the distracting nature of multitasking can impair our ability to solve complex problems.

Although I understood every single word he wrote, after three dog nights, I can’t help but take the time to share my joy to the world by channeling my inner beagle and making the case for hyperactivity—in other words, our need to simply become better multitaskers.

The beloved mascot of my blog post is Bailey, not only a great example of a typical Beagle, but also my brother’s family dog, who is striking a heroic pose in this picture while proudly sporting his all-time favorite Halloween costume—Underdog.

I could think of no better hero to champion my underdog of a cause:

“There’s no need to fear . . . hyperactivity!”

Please Note: Just because Phil Simon coincidentally uses “Simon Says” as the heading for all his blog conclusions, doesn’t mean Phil is Simon Bar Sinister, who coincidentally used “Simon Says” to explain his diabolical plans—that’s completely coincidental.

The Power of Less

I recently read The Power of Less, the remarkable book by Leo Babauta, which provides practical advice on simplifying both our professional and personal lives. The book has a powerfully simple message—identify the essential, eliminate the rest.

I believe that the primary reason multitasking gets such a bad reputation is the numerous non-essential tasks typically included.

Many daily tasks are simply “busy work” that we either don’t really need to do at all, or don’t need to do as frequently. We have allowed ourselves to become conditioned to perform certain tasks, such as constantly checking our e-mail and voice mail.

Additionally, whenever we do find a break in our otherwise hectic day, “nervous energy” often causes us to feel like we should be doing something with our time—and so the vicious cycle of busy work begins all over again.

“Doing nothing is better than being busy doing nothing,” explained Lao Tzu.

I personally find that whenever I am feeling overwhelmed by multitasking, it’s not because I am trying to distribute my time among a series of essential tasks—instead, I was really just busy doing a whole lot of nothing. “Doing a huge number of things,” explains Babauta, “doesn’t mean you’re getting anything meaningful done.”

Meaningful accomplishment requires limiting our focus to only essential tasks. Unlimited focus, according to Babauta, is like “taking a cup of red dye and pouring it into the ocean, and watching the color dilute into nothingness. Limited focus is putting that same cup of dye into a gallon of water.”

Only you can decide which tasks are essential. Look at your “to do list” and first identify the essential—then eliminate the rest.

It’s about the journey—not the destination

Once you have eliminated the non-essential tasks, your next challenge is limiting your focus to only the essential tasks.

Perhaps the simplest way to limit your focus and avoid the temptation of multitasking altogether is to hyper-focus on only one task at a time. So let’s use reading a non-fiction book as an example of one of the tasks you identified as essential.

Some people would read this non-fiction book as fast as they possibly can—hyper-focused and not at all distracted—as if they’re trying to win “the reading marathon” by finishing the book in the shortest time possible.

They claim that this gives them both a sense of accomplishment and allows them to move on to their next essential task, thereby always maintaining their vigilant hyper-focus of performing only one task at a time.

However, what did they actually accomplish other than simply completing the task of reading the book?

I find people—myself included—that voraciously read non-fiction books often struggle when attempting to explain the book, and in fact, they usually can’t tell you anything more than what you would get from simply reading the jacket cover of the book.

Furthermore, they often can’t demonstrate any proof of having learned anything from reading the book. Now, if they were reading fiction, I would argue that’s not a problem. However, their “undistracted productivity” of reading a non-fiction book can easily amount to nothing more than productive entertainment.

They didn’t mind the gap between the acquisition of new information and its timely and practical application. Therefore, they didn’t develop valuable knowledge. They didn’t move forward on their personal journey toward wisdom.

All they did was productively move the hands of the clock forward—all they did was pass the time.

Although by eliminating distractions and focusing on only essential tasks, you’ll get more done and reach your destination faster, in my humble opinion, a meaningful life is not a marathon—a meaningful life is a race not to run.

It’s about the journey—not the destination. In the words of Ralph Waldo Emerson:

“With the past, I have nothing to do; nor with the future. I live now.”

Hyperactivity is Simply Better Multitasking

Although I do definitely believe in the power of less, the need to eliminate non-essential tasks, and the need to focus my attention, I am far more productive when hyper-active (i.e., intermittently alternating my attention among multiple simultaneous tasks).

Hyperactively collecting small pieces of meaningful information from multiple sources, as well as from the scattered scraps of knowledge whirling around inside my head, is more challenging, and more stressful, than focusing on only one task at a time.

However, at the end of most days, I find that I have made far more meaningful progress on my essential tasks.

Although, in all fairness, I often breakdown and organize essential tasks into smaller sub-tasks, group similar sub-tasks together, then I multitask within only one group at a time. This lower-level multitasking minimizes what I call the plate spinning effect, where an interruption can easily cause a disastrous disruption in productivity.

Additionally, I believe that not all distractions are created equal. Some, in fact, can be quite serendipitous. Therefore, I usually allow myself to include one “creative distraction” in my work routine. (Typically, I use either Twitter or some source of music.)

By eliminating non-essential tasks, grouping together related sub-tasks, and truly embracing the chaos of creative distraction, hyperactivity is simply better multitasking—and I think that in the Digital Age, this is a required skill we all must master.

The Rumble in the Dog Park

So which is better? Stubbornness or Hyperactivity? In the so-called Rumble in the Dog Park, who wins? Bulldogs or Beagles?

I know that I am a Beagle. Phil knows he is a Bulldog. I would be unhappy as a Bulldog. Phil would be unhappy as a Beagle.

And that is the most important point.

There is absolutely no better way to make yourself unhappy than by trying to live by someone else’s definition of happiness.

You should be whatever kind of dog that truly makes you happy. In other words, if you prefer single-tasking, then be a Bulldog, and if you prefer multitasking, then be a Beagle—and obviously, Bulldogs and Beagles are not the only doggone choices.

Maybe you’re one of those people who prefers cats—that’s cool too—just be whatever kind of cool cat truly makes you happy.

Or maybe you’re neither a dog person nor a cat person. Maybe you’re more of a Red-Eared Slider kind of person—that’s cool too.

And who ever said that you had to choose to be only one kind of person anyway?

Maybe some days you’re a Beagle, other days you’re a Bulldog, and on weekends and vacation days you’re a Red-Eared Slider.

It’s all good.

Just remember—no matter what—always be you.

June 25, 2010

Twitter, Meaningful Conversations, and #FollowFriday

June 25, 2010/ Jim Harris

In social media, one of the most common features of social networking services is allowing users to share brief status updates. Twitter is currently built on only this feature and uses status updates (referred to as tweets) that are limited to a maximum of 140 characters, which creates a rather pithy platform that many people argue is incompatible with meaningful communication.

Although I use Twitter for a variety of reasons, one of them is sharing quotes that I find thought-provoking. For example:

This George Santayana quote was shared by James Geary, whom I follow on Twitter because he uses his account to provide the “recommended daily dose of aphorisms.” My re-tweet (i.e., “forwarding” of another user’s status update) triggered the following meaningful conversation with Augusto Albeghi, the founder of StraySoft who is known as @Stray__Cat on Twitter:

Now of course, I realize that what exactly constitutes a “meaningful conversation” is debatable regardless of the format.

Therefore, let me first provide my definition, which is comprised of the following three simple requirements:

At least two people discussing a topic, which is of interest to all parties involved
Allowing all parties involved to have an equal chance to speak (or otherwise share their thoughts)
Attentively listening to the current speaker—as opposed to merely waiting for your turn to speak

Next, let’s examine why Twitter’s format can be somewhat advantageous to satisfying these requirements:

Although many (if not most) tweets are not necessarily attempting to start a conversation, at the very least they do provide a possible topic for any interested parties
Everyone involved has an equal chance to speak, but time lags and multiple simultaneous speakers can occur, which in all fairness can happen in any other format
Tweets provide somewhat of a running transcript (again, time lags can occur) for the conversation, making it easier to “listen” to the other speaker (or speakers)

Now, let’s address the most common objection to Twitter being used as a conversation medium:

“How can you have a meaningful conversation when constrained to only 140 characters at a time?”

I admit to being a long-winded talker or, as a favorite (canceled) television show would say, “conversationally anal-retentive.” In the past (slightly less now), I was also known for e-mail messages even Leo Tolstoy would declare to be far too long.

However, I wholeheartedly agree with Jennifer Blanchard, who explained how Twitter makes you a better writer. When forced to be concise, you have to focus on exactly what you want to say, using as few words as possible.

I call this reduction of your message to its bare essence—the power of pith. In order to engage in truly meaning conversations, this is a required skill we all must master, and not just for tweeting—but Twitter does provide a great practice environment.

At least that’s my 140 characters worth on this common debate—well okay, it’s more like my 5,000 characters worth.

Great folks to follow on Twitter

Since this blog post was published on a Friday, which for Twitter users like me means it’s FollowFriday, I would like to conclude by providing a brief list of some great folks to follow on Twitter.

Although by no means a comprehensive list, and listed in no particular order whatsoever, here are some great tweeps, and especially if you are interested in Data Quality, Data Governance, Master Data Management, and Business Intelligence:

Augusto Albeghi – @Stray__Cat
Henrik Liliendahl Sørensen – @hlsdk
Dylan Jones – @DataQualityPro
Phil Simon – @PhilSimon
Rich Murnane – @murnane
David Loshin – @DavidLoshin
DataFlux – @DataFlux
Jill Wanless – @sheezaredhead
Datamartist – @Datamartist
Baseline Consulting – @BaselineConsult
Jill Dyché – @JillDyche
Rob Paller – @RobPaller
Initiate, an IBM Company – @IBMInitiate
Jacqueline Roberts – @JackieMRoberts
Terri Rylander – @BIMarcom
Garnie Bolling – @GarnieBolling
Steve Sarsfield – @SteveSarsfield
Julian Schwarzenbach – @jschwa1
Phil Wright – @faropress
William Sharp – @dqchronicle
Daragh O Brien – @daraghobrien
Ken O'Connor – @KenOConnorData
Graham Rhind – @GrahamRhind
Dalton Cervo – @dcervo
Dan Power – @dan_power
Merv Adrian – @merv
Robert Karel – @rbkarel
Ted Friedman – @ted_friedman
Loraine Lawson – @LoraineLawson
Peter Thomas – @PeterJThomas

PLEASE NOTE: No offense is intended to any of my tweeps not listed above. However, if you feel that I have made a glaring omission of an obviously Twitterific Tweep, then please feel free to post a comment below and add them to the list. Thanks!

I hope that everyone has a great FollowFriday and an even greater weekend. See you all around the Twittersphere.

Wordless Wednesday: June 16, 2010

Data Rock Stars: The Rolling Forecasts

The Fellowship of #FollowFriday

Social Karma (Part 7)

The Wisdom of the Social Media Crowd

The Twitter Clockwork is NOT Orange

Video: Twitter #FollowFriday – January 15, 2010

Video: Twitter Search Tutorial

Live-Tweeting: Data Governance

Brevity is the Soul of Social Media

If you tweet away, I will follow

Tweet 2001: A Social Media Odyssey

June 17, 2010

Promoting Poor Data Quality

June 17, 2010/ Jim Harris

A few months ago, during an e-mail correspondence with one of my blog readers from Brazil (I’ll let him decide if he wishes to remain anonymous or identify himself in the comments section), I was asked the following intriguing question:

“Who profits from poor data quality?”

The specific choice of verb (i.e., “profits”) may have been a linguistic issue, by which I mean that since I don’t know Portuguese, our correspondence had to be conducted in English.

Please don’t misunderstand me—his writing was perfectly understandable.

As I discussed in my blog post Can Social Media become a Universal Translator?, my native language is English, and like many people from the United States, it is the only language I am fluent in. My friends from Great Britain would most likely point that I am only fluent in the American “version” of the English language, but that’s a topic for another day—and another blog post.

When anyone communicates in another language—and especially in writing—not every word may be exactly right.

For example: Muito obrigado por sua pergunta!

Hopefully (and with help from Google Translate), I just wrote “thank you for your question” in Portuguese.

My point is that I believe he was asking why poor data quality continues to persist as an extremely prevalent issue, especially when its detrimental effects on effective business decisions has become painfully obvious given the recent global financial crisis.

However, being mentally stuck on my literal interpretation of the word “profit” has delayed my blog post response—until now.

Promoting Poor Data Quality

In economics, the term “flight to quality” describes the aftermath of a financial crisis (e.g., a stock market crash) when people become highly risk-averse and move their money into safer, more reliable investments. A similar “flight to data quality” often occurs in the aftermath of an event when poor data quality negatively impacted decision-critical enterprise information.

The recent recession provides many examples of the financial aspect of this negative impact. Therefore, even companies that may not have viewed poor data quality as a major risk—and a huge cost greatly decreasing their profits—are doing so now.

However, the retail industry has always been known for its paper thin profit margins, which are due, in large part, to often being forced into the highly competitive game of pricing. Although dropping the price is the easiest way to sell just about any product, it is also virtually impossible to sustain this rather effective, but short-term, tactic as a viable long-term business strategy.

Therefore, a common approach used to compete on price without risking too much on profit is to promote sales using a rebate, which I believe is a business strategy intentionally promoting poor data quality for the purposes of increasing profits.

You break it, you slip it—either way—you buy it, we profit

The most common form of a rebate is a mail-in rebate. The basic premise is simple. Instead of reducing the in-store price of a product, it is sold at full price, but a rebate form is provided that the customer can fill out and mail to the product’s manufacturer, which will then mail a rebate check to the customer—usually within a few business weeks after approving the rebate form.

For example, you could purchase a new mobile phone for $250 with a $125 mail-in rebate, which would make the “sale price” only $125—which is what the store will advertise as the actual sale price with “after a $125 mail-in rebate” written in small print.

Two key statistics significantly impact the profitability of these type of rebate programs, breakage and slippage.

Breakage is the percentage of customers who, for reasons I will get to in a moment, fail to take advantage of the rebate, and therefore end up paying full price for the product. Returning to my example, the mobile phone that would have cost $125 if you received the $125 mail-in rebate, instead becomes exactly what you paid for it—$250 (plus applicable taxes, of course).

Slippage is the percentage of customers who either don’t mail in the rebate form at all, or don’t cash their received rebate check. The former is the most common “slip,” while the latter is usually caused by failing to cash the rebate check before it expires, which is typically 30 to 90 days after it is processed (i.e., expiration dated)—and regardless of when it is actually received.

Breakage, and the most common form of slippage, are generally the result of making the rebate process intentionally complex.

Rebate forms often require you to provide a significant amount of information, both about yourself and the product, as well as attach several “proofs of purchase” such as a copy of the receipt and the barcode cut out of the product’s package.

Data entry errors are perhaps the most commonly cited root cause of poor data quality.

Rebates seem designed to guarantee data entry errors (by encouraging the customer to fill out the rebate form incorrectly).

In this particular situation, the manufacturer is hyper-vigilant about data quality and for an excellent reason—poor data quality will either delay or void the customer’s rebate.

Additionally, the fine print of the rebate form can include other “terms and conditions” voiding the rebate—even if the form is filled out perfectly. A common example is the limitation of “only one rebate per postal address.” This sounds reasonable, right?

Well, one major electronics manufacturer used this disclaimer to disqualify all customers who lived in multiple unit dwellings, such as an apartment building, where another customer “at the same postal address” had already applied for a rebate.

Conclusion

Statistics vary by product and region, but estimates show that breakage and slippage combine on average to result in 40% of retail customers paying full price when making a purchasing decision based on a promotional price requiring a mail-in rebate.

So who profits from poor data quality? Apparently, the retail industry does—sometimes.

Poor data quality (and poor information quality in the case of intentionally confusing fine print) definitely has a role to play with mail-in rebates—and it’s a supporting role that can definitely lead to increased profits.

Of course, the long-term risks and costs associated with alienating the marketplace with gimmicky promotions take their toll.

In fact, the major electronics manufacturer mentioned above was actually substantially fined in the United States and forced to pay hundreds of thousands of dollars worth of denied mail-in rebates to customers.

Therefore, poor data quality, much like crime, doesn’t pay—at least not for very long.

I am not trying to demonize the retail industry.

Excluding criminal acts of intentional fraud, such as identity theft and money laundering, this was the best example I could think of that allowed me to respond to a reader’s request—without using the far more complex example of the mortgage crisis.

What Say You?

Can you think of any other examples of the possible benefits—intentional or accidental—derived from poor data quality?

April 22, 2010

The Challenging Gift of Social Media

April 22, 2010/ Jim Harris

I recently finished reading (and also highly recommend) the excellent book Linchpin: Are You Indispensable? by Seth Godin.

Although it’s not the subject of the book, in this blog post I’ll focus on one of its concepts that is very applicable to social media.

The Circles of the Gift System

Godin uses the term “Gift Culture” to describe an emerging ethos facilitated by (but not limited to) the Internet and social media, which involves what he calls “The Circles of the Gift System” that I have attempted to represent in the above diagram.

In the first circle are your true real-world friends and family, the people that you would never interact with on the basis of trying to make money (i.e., the people you freely give “true gifts” while expecting nothing in return).

In the second circle are your customers and clients, the people that you conduct commerce with and who must pay you for your time, products, and services (i.e., the people and organizations you don’t give gifts because you need them to help pay your bills).

In the third circle is the social media and extended (nowadays mostly online) community, where following the freemium model, you give freely so that you can reach as many people as possible. It is in the third circle that you assemble your tribe comprised of blog readers, Twitter followers, Facebook fans, and other “friendlies” — the term Godin uses for our social media connections.

It is the third circle that many (if not most) people struggle with and often either resist or ignore. However, as Godin explains:

“This circle is new. It’s huge and it’s important, because it enables you to enlarge the second circle and make more money, and because it enables you to affect more people and improve more lives.”

However, dedicating the necessary time and effort to enlarge the third circle doesn’t guarantee you will enlarge the second circle, which risks turning freemium into simply free. It is on this particular aspect that I will focus the remainder of my blog post.

The Intriguing Opportunity of Social Media

It is difficult to imagine a business topic generating more widespread discussion these days than social media. That’s not to say that it is (or that it even should be) considered the most important topic. However, almost every organization as well as most individual professionals have at the very least considered getting involved with social media in a business context.

The intriguing opportunity of social media is difficult to ignore—even after you ignore most of the hype (which is no easy task).

But as I wrote in the Social Karma series, if we are truly honest, then we all have to admit that we have the same question:

“What’s in this for me?”

Using social media effectively can definitely help promote you, your expertise, your company, and its products and services. The primary reason I started blogging was to demonstrate my expertise and establish my authority with regards to data quality and its related disciplines. As an independent consultant, I am trying to help sell my consulting, speaking, and writing services.

The Sobering Reality of Social Media

A social media strategy focused entirely on your own self-promotion will be easily detected by the online community, and could therefore easily result in doing far more harm than good. Effectively using social media for business requires true participation, sustained engagement, and making meaningful contributions to the community’s goals—and not just your own.

The sobering reality of social media is that it’s not something you can simply do whenever it’s convenient for you.

Using social media effectively, more than anything else, requires a commitment that is mostly measured in time. It requires a long-term investment in the community, and the truth is you must be patient because any returns on this investment will take a long time to materialize.

If you are planning on a quick get in, get out, short-term marketing campaign requiring little effort, then don’t waste your time, but much more importantly, don’t waste the community’s time.

The Challenging Gift of Social Media

Godin opens his chapter on “The Powerful Culture of Gifts” by joking that he must have been absent the day they taught the power of unreciprocated gifts at Stanford business school.

In fact, it’s probably a safe bet that the curriculum at most business schools conveniently ignores the fifty thousand year tradition of human tribal economies based on mutual support and generosity, when power used to be about giving, not getting.

Although we maintain some semblance of this tribal spirit in our personal lives with respect to the first circle, when it comes to our professional lives in the second circle, we want money for our time, product, or service—and we usually don’t come cheap.

Therefore, by far the most common question that I get asked (and that I often ask myself) about social media is:

“Is it really worth all that time and effort, especially when you aren’t getting paid for it?”

Although I honestly believe that it is, truthfully there have been many times when I have doubted it. But those were usually times when I allowed myself to give in to the natural tendency we all have to become hyper-focused on our own goals.

The paradox is that the best way to accomplish our selfish goals is—first and foremost—to focus on helping others.

Of course, helping others doesn’t guarantee they’ll reciprocate, especially with financial returns on our social media investment. Returning to Godin’s analogy, enlarging (or even just maintaining) the third circle doesn’t guarantee enlarging the second circle.

However, true service to the social media community requires giving true gifts to the third circle.

Godin explains that these gifts—which do not demand reciprocation—turn the third circle into your tribe. Giving gifts fulfills your tribal obligation. Recipients pay it forward by also giving gifts—but perhaps to another tribal member—and not back to you.

And this is the challenging gift of social media—it is a gift that you may keep on giving without ever getting anything in return.

Freemium is the future – and the future is now

Social Karma

True Service

April 12, 2010

Microwavable Data Quality

April 12, 2010/ Jim Harris

Data quality is definitely not a one-time project, but instead requires a sustained program of enterprise-wide best practices that are best implemented within a data governance framework that “bakes in” defect prevention, data quality monitoring, and near real-time standardization and matching services—all ensuring high quality data is available to support daily business decisions.

However, implementing a data governance program is an evolutionary process requiring time and patience.

Baking and cooking also require time and patience. Microwavable meals can be an occasional welcome convenience, and if you are anything like me (my condolences) and you can’t bake or cook, then microwavable meals can be an absolute necessity.

Data cleansing can also be an occasional (not necessarily welcome) convenience, or a relative necessity (i.e., a “necessary evil”).

Last year on Data Quality Pro, Dylan Jones hosted a great debate on the necessity of data cleansing, which is well worth reading, especially since the over 25 (and continuing) comments it received proves it is a polarizing topic for the data quality profession.

I reheated this debate (using the Data Quality Microwave, of course) earlier this year with my A Tale of Two Q’s blog post, which also received many commendable comments (but far less than Dylan’s blog post—not that I am counting or anything).

Similarly, a heated debate can be had over the health implications of the microwave. Eating too many microwavable meals is certainly not healthy, but I have many friends and family who would argue quite strongly for either side of this “food fight.”

Both of these great debates can be as deeply polarizing as Pepsi vs. Coke and Soccer vs. Football. Just for the official record, I am firmly for both Pepsi and Football—and by Football, I mean NFL Football—and firmly against both Coke and Soccer.

Just as I advocate that everyone (myself included) should learn how to cook, but still accept the eternal reality of the microwave, I definitely advocate the implementation of a data governance program, but I also accept the eternal reality of data cleansing.

However, my lawyers have advised me to report that beta testing for an actual Data Quality Microwave has not been promising.

A Tale of Two Q’s

Hyperactive Data Quality (Second Edition)

The General Theory of Data Quality

Follow OCDQ

If you enjoyed this blog post, then please subscribe to OCDQ via my RSS feed, my E-mail updates, or Google Reader.

You can also follow OCDQ on Twitter, fan the Facebook page for OCDQ, and connect with me on LinkedIn.

April 05, 2010

Can Enterprise-Class Solutions Ever Deliver ROI?

April 05, 2010/ Jim Harris

The information technology industry has a great fondness for enterprise-class solutions and TLAs (two or three letter acronyms): ERP (Enterprise Resource Planning), DW (Data Warehousing), BI (Business Intelligence), MDM (Master Data Management), DG (Data Governance), DQ (Data Quality), CDI (Customer Data Integration), CRM (Customer Relationship Management), PIM (Product Information Management), BPM (Business Process Management), etc. — and new TLAs are surely coming soon.

But there is one TLA to rule them all, one TLA to fund them, one TLA to bring them all and to the business bind them—ROI.

Enterpri$e-Cla$$ $olution$

All enterprise-class solutions have one thing in common—they require a significant investment and total cost of ownership.

Most enterprise software/system licenses start in the six figures. Due in large part to vendor consolidation, many are embedded within a consolidated enterprise application development platform with seamlessly integrated components offering an end-to-end solution that pushes the license well into seven figures.

On top of the licensing, you have to add the annual maintenance fees, which are usually in the five figures—sometimes more.

Add to the total cost of the solution the professional services needed for training and consulting for installation, configuration, application development, testing, and production implementation, and you have another six figure annual investment.

With such a significant investment and total cost of ownership required, can enterprise-class solutions ever deliver ROI?

Should I refinance my mortgage?

As a quick (but relevant) tangent, let's use a simple analogy from the world of personal finance.

Similar to most homeowners, I get offers to refinance my mortgage all the time. A common example is an offer that states I can reduce my monthly payments by $200 by refinancing. Sounds great, $200 a month is an annual cost reduction of $2400.

However, this great deal includes $3000 in refinancing costs. Although I start paying $200 less a month immediately, I do not really start saving any money for 15 months, when the monthly “savings” break even with the $3000 in refinancing costs.

Of course, saying only 15 months is ignoring possible tax implications as well as lost interest or returns that I could have earned since the $3000 likely came from either a savings or an investment account.

Additionally, refinancing might not be a good idea if I plan to sell the house in less than 15 months. The $3000 could instead be invested in finishing my basement or repairing minor damages, which could help increase its value and therefore its sales price.

How does this analogy relate to enterprise-class solutions?

The Business Justification Paradox

Focusing solely on the technical features and ignoring the business benefits of an enterprise-class solution isn’t going to convince either the organization's executive management or its shareholders that the solution is required.

Therefore, emphasis has to placed on the need to make the business justification, where true ROI can only be achieved through tangible business impacts, such as mitigated risks, reduced costs, or increased revenues.

However, a legitimate business justification for any enterprise-class solution is often relatively easy to make.

The business justification paradox is that although an enterprise-class solution definitely has the long-term future potential to reduce costs, mitigate risks, and increase revenues, in the immediate future (and current fiscal year), it will only increase costs, decrease revenues, and therefore potentially increase risks.

In the mortgage analogy, the break even point on the opportunity cost of refinancing can be precisely calculated. Is it even possible to accurately estimate the break even point on the opportunity cost of implementing an enterprise-class solution?

Furthermore, true ROI obviously has to be at least estimated to exceed simply breaking even on the investment.

Given the reality that the longer an initiative takes, the more likely its funding will either be reduced or completely cut, many advocate an agile methodology, which targets iterative cycles quickly delivering small, but tangible value. However, the up-front costs of enterprise licenses and incremental costs of the ongoing efforts and maintenance still loom large on the balance sheet.

Even with “creative” accounting practices, the unquestionably real short-term “ROI high” of following an agile approach could still leave you “chasing the dragon” in search of at least breaking even on your enterprise-class solution's total cost of ownership.

A Call for Debate

My point in this blog post was neither to make the argument that organizations should not invest in enterprise-class solutions, nor to berate organizations for evaluating such possible investments using short-term thinking limited to the current fiscal year.

I am simply trying to encourage an open, honest, and healthy debate about the true ROI of enterprise-class solutions.

I am tired of hearing over-simplifications about how all you need to do is make a valid business justification, as well as attempting to decipher the mystical ROI and total cost of ownership calculations provided by vendors and industry analysts.

I am also tired of being told how emerging industry trends like open source, cloud computing, and software as a service (SaaS) are “less expensive” than traditional approaches. Perhaps that is true, but can they deliver enterprise-class solutions and ROI?

This blog post is a call for debate. Please post a comment. All viewpoints are welcome.

February 06, 2010

The Twitter Clockwork is NOT Orange

February 06, 2010/ Jim Harris

Recently, a Twitter-related tête à tête à tête involving David Carr of The New York Times, Nick Bilton of The New York Times, and George Packer of The New Yorker temporarily made both the Blogosphere all abuzz and the Twitterverse all atwitter.

This was simply another entry in the deeply polarizing debate between those for (Carr and Bilton) and against (Packer) Twitter.

A new decade of debate begins

On January 1, 2010, David Carr published his thoughts in the article Why Twitter Will Endure:

“By carefully curating the people you follow, Twitter becomes an always-on data stream from really bright people in their respective fields, whose tweets are often full of links to incredibly vital, timely information.”

. . .

“Nearly a year in, I’ve come to understand that the real value of the service is listening to a wired collective voice.”

. . .

“At first, Twitter can be overwhelming, but think of it as a river of data rushing past that I dip a cup into every once in a while. Much of what I need to know is in that cup . . . I almost always learn about it first on Twitter.”

. . .

“All those riches do not come at zero cost: If you think e-mail and surfing can make time disappear, wait until you get ahold of Twitter, or more likely, it gets ahold of you. There is always something more interesting on Twitter than whatever you happen to be working on.”

Carr goes on to quote Clay Shirky, author of the book Here Comes Everybody:

“It will be hard to wait out Twitter because it is lightweight, endlessly useful and gets better as more people use it. Brands are using it, institutions are using it, and it is becoming a place where a lot of important conversations are being held.”

The most frightening picture of the future

On January 29, 2010, in his blog post Stop the World, George Packer declared that “the most frightening picture of the future that I’ve read thus far in the new decade has nothing to do with terrorism or banking or the world’s water reserves.”

What was the most frightening picture of the future that Packer had read less than a month into the new decade?

The aforementioned article by David Carr—no, I am not kidding.

“Every time I hear about Twitter,” wrote Packer, “I want to yell Stop! The notion of sending and getting brief updates to and from dozens or thousands of people every few minutes is an image from information hell. I’m told that Twitter is a river into which I can dip my cup whenever I want. But that supposes we’re all kneeling on the banks. In fact, if you’re at all like me, you’re trying to keep your footing out in midstream, with the water level always dangerously close to your nostrils. Twitter sounds less like sipping than drowning.”

Someone who admits that he has, in fact, never even used Twitter, continued with a crack addiction analogy:

“Who doesn’t want to be taken out of the boredom or sameness or pain of the present at any given moment? That’s what drugs are for, and that’s why people become addicted to them.

Carr himself was once a crack addict (he wrote about it in The Night of the Gun). Twitter is crack for media addicts.

It scares me, not because I’m morally superior to it, but because I don’t think I could handle it. I’m afraid I’d end up letting my son go hungry.”

“Call me a digital crack dealer”

On February 3, 2010, in his blog post, The Twitter Train Has Left the Station, Nick Bilton responded:

“Call me a digital crack dealer, but here’s why Twitter is a vital part of the information economy—and why Mr. Packer and other doubters ought to at least give it a Tweet:

Hundreds of thousands of people now rely on Twitter every day for their business. Food trucks and restaurants around the world tell patrons about daily food specials. Corporations use the service to handle customer service issues. Starbucks, Dell, Ford, JetBlue and many more companies use Twitter to offer discounts and coupons to their customers. Public relations firms, ad agencies, schools, the State Department—even President Obama—use Twitter and other social networks to share information.”

. . .

“Most importantly, Twitter is transforming the nature of news, the industry from which Mr. Packer reaps his paycheck. The news media are going through their most robust transformation since the dawn of the printing press, in large part due to the Internet and services like Twitter. After this metamorphosis takes place, everyone will benefit from the information moving swiftly around the globe.”

Bilton concludes his post with a train analogy:

“Ironically, Mr. Packer notes how much he treasures his Amtrak rides in the quiet car of the train, with his laptop closed and cellphone turned off. As I’ve found in previous research, when trains were a new technology 150 years ago, some journalists and intellectuals worried about the destruction that the railroads would bring to society. One news article at the time warned that trains would ‘blight crops with their smoke, terrorize livestock … and people could asphyxiate’ if they traveled on them.

I wonder if, 150 years ago, Mr. Packer would be riding the train at all, or if he would have stayed home, afraid to engage in an evolving society and demanding that the trains be stopped.”

Our apparent appetite for our own destruction

On February 4, 2010, in his blog post Neither Luddite nor Biltonite, George Packer responded:

“It’s true that I hadn’t used Twitter (not consciously, anyway—my editors inform me that this blog has for some time had an automated Twitter feed). I haven’t used crack, either, but—as a Bilton reader pointed out—you don’t need to do the drug to understand the effects.”

. . .

“Just about everyone I know complains about the same thing when they’re being honest—including, maybe especially, people whose business is reading and writing. They mourn the loss of books and the loss of time for books. It’s no less true of me, which is why I’m trying to place a few limits on the flood of information that I allow into my head.”

. . .

“There’s no way for readers to be online, surfing, e-mailing, posting, tweeting, reading tweets, and soon enough doing the thing that will come after Twitter, without paying a high price in available time, attention span, reading comprehension, and experience of the immediately surrounding world. The Internet and the devices it’s spawned are systematically changing our intellectual activities with breathtaking speed, and more profoundly than over the past seven centuries combined. It shouldn’t be an act of heresy to ask about the trade-offs that come with this revolution.”

. . .

“The response to my post tells me that techno-worship is a triumphalist and intolerant cult that doesn’t like to be asked questions. If a Luddite is someone who fears and hates all technological change, a Biltonite is someone who celebrates all technological change: because we can, we must. I’d like to think that in 1860 I would have been an early train passenger, but I’d also like to think that in 1960 I’d have urged my wife to go off Thalidomide.”

. . .

“American newspapers and magazines will continue to die by the dozen. The economic basis for reporting (as opposed to information-sharing, posting, and Tweeting) will continue to erode. You have to be a truly hard-core techno-worshipper to call this robust. Any journalist who cheerleads uncritically for Twitter is essentially asking for his own destruction.”

. . .

“It’s true that Bilton will have news updates within seconds that reach me after minutes or hours or even days.

It’s a trade-off I can live with.”

Packer concludes his post by quoting the end of G. B. Trudeau's book My Shorts R Bunching. Thoughts?:

“The time you spend reading this tweet is gone, lost forever, carrying you closer to death. Am trying not to abuse the privilege.”

The Twitter Clockwork is NOT Orange

A Clockwork Orange

The primary propaganda used by the anti-Twitter lunatic fringe is comparing the microblogging and social networking service to that disturbing scene (pictured above) from the movie A Clockwork Orange, where you are confined within a straight jacket, your head strapped into a restraining chair preventing you from looking away, your eyes clamped to remain open—and you are forced to stare endlessly into the abyss of the cultural apocalypse that the Twitterverse is apparently supposed to represent.

You can feel free to call me a Biltonite, because I obviously agree far more with Bilton and Carr—and not with Packer.

Of course, I recommend you read all four of the articles/posts I linked to and selectively quoted above. Especially Carr's article, which was far more balanced than either my quotes or Packer's posts reflect.

Social Media Will Endure

We continue to witness the decline of print media and the corresponding evolution of social media. I completely understand why Packer (and others with a vested interest in print media) want to believe social media is a revolution that must be put down.

Hence the outrageous exaggerations Packer uses when comparing Twitter with drug abuse (crack cocaine) and the truly offensive remark of comparing Twitter with one of the worst medical tragedies in modern history (Thalidomide).

I believe the primary reason that social media will endure, beyond our increasing interest in exchanging what has traditionally been only a broadcast medium (print media) for a conversation medium, is because it is enabling our communication to return to the more direct and immediate forms of information sharing that existed even before the evolution of written language.

Social media is an evolution and not a revolution being forced upon society by unrelenting technological advancements and techno-worship. In many ways, social media is not a new concept at all—technology has simply finally caught up with us.

Humans have always been “social” by our very nature. We have always thrived on connection, conversation, and community.

Social media is rapidly evolving. Therefore, specific services like Twitter may be replaced (or Twitter may continue to evolve).

However, the essence of social media will endure—but the same can't be said of Packerites (neo-Luddites like George Packer).

What Say You?

Please share your thoughts on this debate by posting a comment below.

Or you can share your thoughts with me on Twitter—which reminds me, it's time for me to be strapped back into the chair . . .

January 09, 2010

OOBE-DQ, Where Are You?

January 09, 2010/ Jim Harris

Much of enterprise software is often viewed as a commercial off-the-shelf (COTS) product, which, in theory, is supposed to provide significant advantages over bespoke, in-house solutions. In this blog post, I want to discuss your expectations about the out-of-box-experience (OOBE) provided by data quality (DQ) software, or as I prefer to phrase this question:

OOBE-DQ, Where Are You?

Common DQ Software Features

There are many DQ software vendors to choose from and all of them offer viable solutions driven by impressive technology. Many of these vendors have very similar approaches to DQ, and therefore provide similar technology with common features, including the following (Please Note: some vendors have a suite of related products collectively providing these features):

Data Profiling
Data Quality Assessment
Data Standardization
Data Matching
Data Consolidation
Data Integration
Data Quality Monitoring

A common aspect of OOBE-DQ is the “ease of use” vs. “powerful functionality” debate—ignoring the Magic Beans phenomenon, where the Machiavellian salesperson guarantees you their software is both remarkably easy to use and incredibly powerful.

So just how easy is your Ease of Use?

“Ease of use” can be difficult to qualify since it needs to take into account several aspects:

— Installation and configuration
— Integration within a suite of related products (or connectivity to other products)
— Intuitiveness of the user interface(s)
— Documentation and context sensitive help screens
— Ability to effectively support a multiple user environment
— Whether performed tasks are aligned with different types of users

There are obviously other aspects, some of which may vary depending on your DQ initiative, your specific industry, or your organizational structure. However, the bottom line is hopefully the DQ software doesn't require your users to be as smart as Brainiac (pictured above) in order to be able to figure out how to use it, both effectively and efficiently.

DQ Powers—Activate!

Ease of use is obviously a very important aspect of OOBE-DQ. However, as Duke Ellington taught us, it don't mean a thing, if it ain't got that swing—in order words, if it's easy to use but can't do anything, what good is it? Therefore, powerful functionality is also important.

“Powerful functionality” can be rather subjective, but probably needs to at least include these aspects:

— Fast processing speed
— Scalable architecture
— Batch and near real-time execution modes
— Pre-built functionality for common tasks
— Customizable and reusable components

Once again, there are obviously other aspects, especially depending on the specifics of your situation. However, in my opinion, one of the most important aspects of DQ functionality is how it helps (as pictured above) enable Zan (i.e., technical stakeholders) and Jayna (i.e., business stakeholders) to activate their most important power—collaboration. And of course, sometimes even the Wonder Twins needed the help of their pet space monkey Gleek (i.e., data quality consultants).

OOBE-DQ, Where Are You?

Where are you in the OOBE-DQ debate? In other words, what are your expectations when evaluating the out-of-box-experience (OOBE) provided by data quality (DQ) software?

Where do you stand in the “ease of use” vs. “powerful functionality” debate?

Are there situations where the prioritization of ease of use makes a lack of robust functionality more acceptable?

Are there situations where the prioritization of powerful functionality makes a required expertise more acceptable?

Please share your thoughts by posting a comment below.

Follow OCDQ

If you enjoyed this blog post, then please subscribe to OCDQ via my RSS feed or my E-mail updates.

You can also follow OCDQ on Twitter, fan the Facebook page for OCDQ, and connect with me on LinkedIn.

OCDQ Blog

Data Accuracy

A Contrarian’s View of Agile BI

Scrum Screwed Up: All Chicken and No Pig

Scrum Screwed Up: All Pig and No Chicken

Scrum is NOT a Silver Bullet

An E-mail from the Edge

Blind Vendor Allegiance

SaaS-y Data Quality on a Cloudy Business Day?

So which came first, the Data Quality Tool or the Business Need?

Related Posts

Completeness and Accuracy

Data-driven Solutions for Business Problems

Data, data everywhere

Data Memorioso

“Data quality is primarily about context not accuracy...”

The Free-Form Effect

Accuracy: With or Without Context?

What Say You?

Related Posts

Follow OCDQ

The OCDQ View on OOBE-DQ

Confirmation Bias

Features and Benefits: The Game Nobody Wins

The Golden Circle

Do you believe in Magic (Quadrants)?

The Power of Less

It’s about the journey—not the destination

Hyperactivity is Simply Better Multitasking

The Rumble in the Dog Park

Great folks to follow on Twitter

Related Posts

Promoting Poor Data Quality

You break it, you slip it—either way—you buy it, we profit

Conclusion

What Say You?

The Circles of the Gift System

The Intriguing Opportunity of Social Media

The Sobering Reality of Social Media

The Challenging Gift of Social Media

Related Posts

Related Posts

Follow OCDQ

Enterpri$e-Cla$$ $olution$

Should I refinance my mortgage?

The Business Justification Paradox

A Call for Debate

A new decade of debate begins

The most frightening picture of the future

“Call me a digital crack dealer”

Our apparent appetite for our own destruction

The Twitter Clockwork is NOT Orange

Social Media Will Endure

What Say You?

Common DQ Software Features

So just how easy is your Ease of Use?

DQ Powers—Activate!

OOBE-DQ, Where Are You?

Follow OCDQ

OCDQ Blog