Alternatives to Enterprise Data Quality Tools
Jim Harris in
Data Quality,
Debates,
Products,
Technology,
Vendors tagged
Best of 2011,
Data Profiling,
Open Source
Monday, February 21, 2011 at 3:00AM The recent analysis by Andy Bitterer of Gartner Research (and ANALYSTerical) about the acquisition of open source data quality tool DataCleaner by the enterprise data quality vendor Human Inference, prompted the following Twitter conversation:
Since enterprise data quality tools can be cost-prohibitive, more prospective customers are exploring free and/or open source alternatives, such as the Talend Open Profiler, licensed under the open source General Public License, or non-open source, but entirely free alternatives, such as the Ataccama DQ Analyzer. And, as Andy noted in his analysis, both of these tools offer an easy transition to the vendors’ full-fledged commercial data quality tools, offering more than just data profiling functionality.
As Henrik Liliendahl Sørensen explained, in his blog post Data Quality Tools Revealed, data profiling is the technically easiest part of data quality, which explains the tool diversity, and early adoption of free and/or open source alternatives.
And there are also other non-open source alternatives that are more affordable than enterprise data quality tools, such as Datamartist, which combines data profiling and data migration capabilities into an easy-to-use desktop application.
My point is neither to discourage the purchase of enterprise data quality tools, nor promote their alternatives—and this blog post is certainly not an endorsement—paid or otherwise—of the alternative data quality tools I have mentioned simply as examples.
My point is that many new technology innovations originate from small entrepreneurial ventures, which tend to be specialists with a narrow focus that can provide a great source of rapid innovation. This is in contrast to the data management industry trend of innovation via acquisition and consolidation, embedding data quality technology within data management platforms, which also provide data integration and master data management (MDM) functionality as well, allowing the mega-vendors to offer end-to-end solutions and the convenience of one-vendor information technology shopping.
However, most software licenses for these enterprise data management platforms start in the six figures. On top of the licensing, you have to add the annual maintenance fees, which are usually in the five figures. Add to the total cost of the solution, the professional services that are needed for training and consulting for installation, configuration, application development, testing, and production implementation—and you have another six figure annual investment.
Debates about free and/or open source software usually focus on the robustness of functionality and the intellectual property of source code. However, from my perspective, I think that the real reason more prospective customers are exploring these alternatives to enterprise data quality tools is because of the free aspect—but not because of the open source aspect.
In other words—and once again I am only using it as an example—I might download Talend Open Profiler because I wanted data profiling functionality at an affordable price—but not because I wanted the opportunity to customize its source code.
I believe the “try it before you buy it” aspect of free and/or open source software is what’s important to prospective customers.
Therefore, enterprise data quality vendors, instead of acquiring an open source tool as Human Inference did with DataCleaner, how about offering a free (with limited functionality) or trial version of your enterprise data quality tool as an alternative option?
Related Posts
Do you believe in Magic (Quadrants)?
Can Enterprise-Class Solutions Ever Deliver ROI?
Which came first, the Data Quality Tool or the Business Need?
Selling the Business Benefits of Data Quality
What Data Quality Technology Wants



Reader Comments (4)
Great stuff, Jim.
ERP vendors are moving in the same direction as Data Quality vendors with the “try it before you buy it” option.
Makes sense to me...
Hi Jim,
Love the post. You and Andy both make some good points.
You tripped over a couple of my pet peeves in the process, though. So, with your permission . . .
<Begin Rant>
Peeve 1: Startups are not the only ones who innovate. (See my Old Dogs and New Tricks blog post). It does take significant investment, but companies that want to stay around for the long haul and keep costs at a reasonable level don't wait around for venture funded startups to do the work, then buy them for mega price tags and pass that on to the consumer. Established companies are fully capable of innovation. They just have to make it a priority.
Peeve 2: Free or mega expensive are not the only options. Here's a crazy idea, how about enterprise level data quality software that DOESN'T cost more than the gross national product of Paraguay? Free trial versions are fine, and I've got nothing against open source, but when did it become passé to offer a good product at a sensible price?
</End Rant>
Paige
Jim, thanks for the informative post and for your comments. I also enjoyed reading Andy's note and I do agree with his analysis. (Sorry Andy, I know you enjoy a good blog argument - not today!)
I would like to react to a couple of your points though.
There is more free to open source than "free beer". There is also "free speech". What I mean here, is that the (absence of) cost is not the main reason people use open source, and this is why the approach "a free version from a traditional vendor" is inherently flawed.
The "free as in free speech" in open source means a lot more than being able to download the source code. Jim, you are right, few people want the source code. Oftentimes they are happy to know it's here (better than software escrow...) but what matters the most to them is the openness of the entire process. This includes:
- the ability to extend the product with connectors, patterns, validation routines, etc.
- the access to the support application, bug tracker, component exchange, forums, etc.
- the openness of the dialog with the community - developers and other users of the product
We all know that there is no such thing as free (as in free beer). Learning, deploying and configuring software takes time, requires hardware, and poor quality free software ends up costing more than good quality paid-for software. And this is why, when some proprietary vendors pretend to compete against open source by releasing a free version of their product, they are either grossly mistaken, or lying through their teeth (controversial, me??? nah...). A free version of a proprietary product is nothing more than bait and switch. It's savantly engineered to be useless, otherwise nobody would buy the full version. It never comes with the fundamental freedoms of open source (freedom to use, freedom to modify, freedom to redistribute). And it never comes with the openness that is required for real community adoption and widespread use.
(I know I'll be getting a comment on the supposed "lies of open source", so I'll save the argument for my counter response).
Now - on Human Inference buying DataCleaner - again, I agree with Andy's research note. I'll simply add that, dubious as I am about the integration of DataCleaner into the Human Inference stack, and hence about the success of this acquisition, I am glad to see another open source data quality product on the market, supported by a commercial vendor with the means to make it successful if they choose to. Diversity is good for the market, for the community, and healthy for the competition.
Yves
Sorry I missed the discussion with you and Andy. I definitely would have jumped in.
Whether you use open source or not, the mere fact that open source companies like Talend exist is a huge market influence. The mega-vendors must address open source pricing models, which are much more reasonable than traditional ones. They must build communities and better collaboration between customers. They have to innovate as quickly in their closed development environment as the crowd innovates.
It disrupts the market by challenging the traditional data management vendors to be better.
I think you hit on something when you said that “many new technology innovations originate from small entrepreneurial ventures.” The collaborative aspect of open source is what makes it really disruptive. Communities are particularly good at developing technologies that wouldn’t otherwise be developed.
If you have a rare legacy database, for example, it may not be supported by any data integration vendor because it is not commercially viable. Chances are, someone in the open source world has built a connector and contributed it to the community. Communities help expand the functionality of a product quite quickly.