August 29, 2011

The Data Cold War

August 29, 2011/ Jim Harris

One of the many things I love about Twitter is its ability to spark ideas via real-time conversations. For example, while live-tweeting during last week’s episode of DM Radio, the topic of which was how to get started with data governance, I tweeted about the data silo challenges and corporate cultural obstacles being discussed.

I tweeted that data is an asset only if it is a shared asset, across the silos, across the corporate culture, and that, in order to be successful with data governance, organizations must replace the mantra “my private knowledge is my power” with “our shared knowledge empowers us all.”

“That’s very socialist thinking,” Mark Madsen responded. “Soon we’ll be having arguments about capitalizing over socializing our data.”

To which I responded that the more socialized data is, the more capitalized data can become . . . just ask Google.

“Oh no,” Mark humorously replied, “decades of political rhetoric about socialism to be ruined by a discussion of data!” And I quipped that discussions about data have been accused of worse, and decades of data rhetoric certainly hasn’t proven very helpful in corporate politics.

Later, while ruminating on this light-hearted exchange, I wondered if we actually are in the midst of the Data Cold War.

The Data Cold War

The Cold War, which lasted approximately from 1946 to 1991, was the political, military, and economic competition between the Communist World, primarily the former Soviet Union, and the Western world, primarily the United States. One of the major tenets of the Cold War was the conflicting ideologies of socialism and capitalism.

In enterprise data management, one of the most debated ideologies is whether or not data should be viewed as a corporate asset, especially by the for-profit corporations of capitalism, which is (even before the Cold War began), and will likely forever remain, the world’s dominant economic model.

My earlier remark that data is an asset only if it is a shared asset, across the silos, across the corporate culture, is indicative of the bounded socialist view of enterprise data. In other words, almost no one in the enterprise data management space is suggesting that data should be shared beyond the boundary of the organization. In this sense, advocates, including myself, of data governance are advocating socializing data within the enterprise so that data can be better capitalized as a true corporate asset.

This mindset makes sense because sharing data with the world, especially for free, couldn’t possibly be profitable — or could it?

The Master Data Management Magic Trick

The genius (and some justifiably ponder if it’s evil genius) of companies like Google and Facebook is they realized how to make money in a free world — by which I mean the world of Free: The Future of a Radical Price, the 2009 book by Chris Anderson.

By encouraging their users to freely share their own personal data, Google and Facebook ingeniously answer what David Loshin calls the most dangerous question in data management: What is the definition of customer?

How do Google and Facebook answer the most dangerous question?

A customer is a product.

This is the first step that begins what I call the Master Data Management Magic Trick.

Instead of trying to manage the troublesome master data domain of customer and link it, through sales transaction data, to the master data domain of product (products, by the way, have always been undeniably accepted as a corporate asset even though product data has not been), Google and Facebook simply eliminate the need for customers (and, by extension, eliminate the need for customer service because, since their product is free, it has no customers) by transforming what would otherwise be customers into the very product that they sell — and, in fact, the only “real” product that they have.

And since what their users perceive as their product is virtual (i.e., entirely Internet-based), it’s not really a product, but instead a free service, which can be discontinued at any time. And if it was, who would you complain to? And on what basis?

After all, you never paid for anything.

This is the second step that completes the Master Data Management Magic Trick — a product is a free service.

Therefore, Google and Facebook magically make both their customers and their products (i.e., master data) disappear, while simultaneously making billions of dollars (i.e., transaction data) appear in their corporate bank accounts.

(Yes, the personal data of their users is master data. However, because it is used in an anonymized and aggregated format, it is not, nor does it need to be, managed like the master data we talk about in the enterprise data management industry.)

Google and Facebook have Capitalized Socialism

By “empowering” us with free services, Google and Facebook use the power of our own personal data against us — by selling it.

However, it’s important to note that they indirectly sell our personal data as anonymized and aggregated demographic data.

Although they do not directly sell our individually identifiable information (because, truthfully, it has very limited, and mostly no legal, value, i.e., that would be identity theft), Google and Facebook do occasionally get sued (mostly outside the United States) for violating data privacy and data protection laws.

However, it’s precisely because we freely give our personal data to them, that until, or if, laws are changed to protect us from ourselves, it’s almost impossible to prove they are doing anything illegal (again, their undeniable genius is arguably evil genius).

Google and Facebook are the exact same kind of company — they are both Internet advertising agencies.

They both sell online advertising space to other companies, which are looking to demographically target prospective customers because those companies actually do view people as potential real customers for their own real products.

The irony is that if all of their users stopped using their free service, then not only would our personal data be more private and more secure, but the new revenue streams of Google and Facebook would eventually dry up because, specifically by design, they have neither real customers nor real products. More precisely, their only real customers (other companies) would stop buying advertising from them because no one would ever see and (albeit, even now, only occasionally) click on their ads.

Essentially, companies like Google and Facebook are winning the Data Cold War because they have capitalized socialism.

In other words, the bottom line is Google and Facebook have socialized data in order to capitalize data as a true corporate asset.

Freemium is the future – and the future is now

The Age of the Platform

Amazon’s Data Management Brain

The Semantic Future of MDM

A Brave New Data World

Big Data and Big Analytics

A Farscape Analogy for Data Quality

Organizing For Data Quality

Sharing Data

Song of My Data

Data in the (Oscar) Wilde

The Most August Imagination

Once Upon a Time in the Data

The Idea of Order in Data

Hell is other people’s data

August 25, 2011

DAMA International

August 25, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

DAMA International is a non-profit, vendor-independent, global association of technical and business professionals dedicated to advancing the concepts and practices of information and data management.

On this episode, special guest Loretta Mahon Smith provides an overview of the Data Management Body of Knowledge (DMBOK) and Certified Data Management Professional (CDMP) certification program.

Loretta Mahon Smith is a visionary and influential data management professional known for her consistent awareness of trends in the forefront of the industry. Since 1983, she has worked in international financial services, and been actively involved in the maturity and growth of Information Architecture functions, specializing in Data Stewardship and Data Strategy Development.

Loretta Mahon Smith has been a member of DAMA for more than 10 years, with a lifetime membership to the DAMA National Capitol Region Chapter. As President of the chapter she has the opportunity to help the Washington DC and Baltimore data management communities. She serves the world community by her involvement on the DAMA International Board as VP of Communications. She additionally volunteers her time to work on the ICCP Certification Council, most recently working on the development of the Zachman and Data Governance examinations.

In the past, Loretta has facilitated Special Interest Group sessions on Governance and Stewardship and presented Stewardship training at numerous local chapters for DAMA, IIBA, TDWI, and ACM, as well as major conferences including Project World (IIBA), INFO360 (AIIM), EDW (DAMA) and the IQ. She earned Certified Computing Professional (CCP), Certified Business Intelligence Professional (CBIP), and Certified Data Management Professional (CDMP) designations, achieving mastery level proficiency rating in Data Warehousing, Data Management, and Data Quality.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

August 23, 2011

Are Cloud Providers the Bounty Hunters of IT?

August 23, 2011/ Jim Harris

This blog post is sponsored by the Enterprise CIO Forum and HP.

Julie Hunt recently blogged about how “line-of-business (LOB) groups have been turning to cloud-based services to quickly set up technology solutions that support their business needs and objectives,” which is especially true when “IT teams are already carrying heavy workloads with ever-shrinking staffing levels, and frequently don’t have the resources to immediately respond to time-sensitive LOB needs.”

As I have previously blogged, speed and agility are the most common business drivers for implementing new technology, and the consumer technologies of cloud computing and software-as-a-service (SaaS) enable business users to directly purchase solutions.

When on-premises IT teams cannot solve their time-sensitive business problems, organizations use off-premises cloud providers, which are essentially the Bounty Hunters of IT.

The Bounty Hunters of IT

In The Empire Strikes Back, frustrated by the inability of on-premises IT teams (in this case, IT stood for Imperial Troops) to solve a time-sensitive business problem (in this case, crushing the competitive rebellion), Darth Vader uses the off-premises force.

“There will be a substantial reward for the one who finds the Millennium Falcon,” Vader explains to a group of bounty hunters. “You are free to use any methods necessary, but I want them alive. No disintegrations.” That last point was specifically directed at Boba Fett, the bounty hunter who would later provide a cloud-based solution by tracking the Millennium Falcon to Cloud City.

Cloud providers are the Bounty Hunters of IT, essentially free to use any technology methods necessary to solve time-sensitive business problems. Although, in the short-term, cloud providers can help, in the long-term, if their solutions are not integrated into the IT Delivery strategy of the organization, they can also hurt. One example is creating new data integration challenges.

“No Data Disintegrations”

“It’s clear,” Hunt explained, “that enterprises will continue to increase usage of cloud and SaaS offerings to find new ways to operate more competitively and efficiently.” However, she noted it’s also clear that “the same challenges that enterprises face for on-premises data management obviously apply to data repositories in the cloud.” And one new challenge is “data that cannot be aligned with enterprise datasets will destroy the value and cost savings that enterprises want from cloud services.”

“Moving any business relevant functionality to the cloud,” Christian Verstraete recently blogged, “requires addressing the issue of integrating the cloud-based applications with the enterprise IT systems.” In his blog post, Verstraete examines three options for integrating cloud data and enterprise data: remote access, synchronization, and dynamic migration.

Although there will always be times and places for leveraging the Bounty Hunters of IT, before Boba Fett sells you a solution in Cloud City, make sure you emphasize that there should be “no data disintegrations.”

In other words, your cloud strategy must include a plan to prevent data in the cloud from becoming disintegrated in the sense that it is not integrated with the rest of the organization’s data.

This blog post is sponsored by the Enterprise CIO Forum and HP.

The Diderot Effect of New Technology

The IT Consumerization Conundrum

The IT Prime Directive of Business First Contact

A Sadie Hawkins Dance of Business Transformation

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The Partly Cloudy CIO

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

August 18, 2011

The Higher Education of Data Quality

August 18, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

On this episode of OCDQ Radio, we leave the corporate world, where data quality and master data management is mostly focused on the challenges of managing data about customers, products, and revenue, and we get schooled in the higher education of data quality. In other words, we discuss data quality and master data management in higher education, which is mostly focused on the challenges of managing data about students, courses, and tuition.

Our guest lecturer will be Mark Horseman, who has been working at the University of Saskatchewan for over 10 years and has been on the implementation team of many of the University’s enterprise software solutions. Mark Horseman now works in Information Strategy and Analytics leveraging his knowledge to assist the University in managing its data quality challenges.

Follow Mark Horseman on Twitter and read his Eccentric Data Quality blog to hear more about the challenges faced by Mark on his quest (yes, it’s a quest) to improve Higher-Education Data Quality.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

August 16, 2011

A Farscape Analogy for Data Quality

August 16, 2011/ Jim Harris

Farscape was one of my all-time favorite science fiction television shows. In the weird way my mind works, the recent blog post (which has received great comments) Four Steps to Fixing Your Bad Data by Tom Redman, triggered a Farscape analogy.

“The notion that data are assets sounds simple and is anything but,” Redman wrote. “Everyone touches data in one way or another, so the tendrils of a data program will affect everyone — the things they do, the way they think, their relationships with one another, your relationships with customers.”

The key word for me was tendrils — like I said, my mind works in a weird way.

Moya and Pilot

On Farscape, the central characters of the show travel through space aboard Moya, a Leviathan, which is a species of living, sentient spaceships. Pilot is a sentient creature (of a species also known as Pilots) with the vast capacity for multitasking that is necessary for the simultaneous handling of the many systems aboard a Leviathan. The tendrils of a Pilot’s lower body are biologically bonded with the living systems of a Leviathan, creating a permanent symbiotic connection, meaning that, once bonded, a Pilot and a Leviathan can no longer exist independently for more than an hour or so, or both of them will die.

Leviathans were one of the many laudably original concepts of Farscape. The role of the spaceship in most science fiction is analogous to the role of a boat. In other words, traveling through space is most often imagined like traveling on water. However, seafaring vessels and spaceships are usually seen as a technological object providing transportation and life support, but not actually alive in its own right (despite the fact that both types of ship are usually anthropomorphized, and usually as a female).

Because Moya was alive, when she was damaged, she felt pain and needed time to heal. And because she was sentient, highly intelligent, and capable of communicating with the crew through Pilot (who was the only one who could understand the complexity of the Leviathan language, which was beyond the capability of a universal translator), Moya was much more than just a means of transportation. In other words, there truly was a symbiotic relationship between, not only Moya and Pilot, but also between Moya and Pilot, and their crew and passengers.

Enterprise and Data

(Sorry, my fellow science fiction geeks, but it’s not that Enterprise and that Data. Perfectly understandable mistake, though.)

Although technically not alive in the biological sense, in many respects, an organization is like a living, sentient organism, and like space and seafaring ships, often anthropomorphized. An enterprise is much more than just a large organization providing a means of employment and offering products and/or services (and, in a sense, life support to its employees and customers).

As Redman explains in his book Data Driven: Profiting from Your Most Important Business Asset, data is not just the lifeblood of the Information Age, data is essential to everything the enterprise does, from helping it better understand its customers, to guiding its development of better products and/or services, to setting a strategic direction toward achieving its business goals.

So the symbiotic relationship between Enterprise and Data is analogous to the symbiotic relationship between Moya and Pilot.

Data is the Pilot of the Enterprise Leviathan. The enterprise can not survive without its data. A healthy enterprise requires healthy data — data of sufficient quality capable of supporting the operational, tactical, and strategic functions of the enterprise.

Returning to Redman’s words, “Everyone touches data in one way or another, so the tendrils of a data program will affect everyone — the things they do, the way they think, their relationships with one another, your relationships with customers.”

So the relationship between an enterprise and its data, and its people, business processes, and technology, is analogous to the relationship between Moya and Pilot, and their crew and passengers. It is the enterprise’s people, its crew (i.e., employees), who, empowered by high quality data and enabled by technology, optimize business processes for superior corporate performance, thereby delivering superior products and/or services to the enterprise’s passengers (i.e., customers).

So why isn’t data viewed as an asset?

So if this deep symbiosis exists, if these intertwined and symbiotic relationships exist, if the tendrils of data are biologically bonded with the complex enterprise ecosystem — then why isn’t data viewed as an asset?

In Data Driven, Redman references the book The Social Life of Information by John Seely Brown and Paul Duguid, who explained that “a technology is never fully accepted until it becomes invisible to those who use it.” The term informationalization describes the process of building data and information into a product or service. “When products and services are fully informationalized,” Redman noted, then data, “blends into the background and people do not even think about it anymore.”

Perhaps that is why data isn’t viewed as an asset. Perhaps data has so thoroughly pervaded the enterprise that it has become invisible to those who use it. Perhaps it is not an asset because data is invisible to those who are so dependent upon its quality.

Perhaps we only see Moya, but not her Pilot.

Organizing For Data Quality

Data, data everywhere, but where is data quality?

Finding Data Quality

The Data Quality Wager

Beyond a “Single Version of the Truth”

Poor Data Quality is a Virus

DQ-Tip: “Don't pass bad data on to the next person...”

Retroactive Data Quality

Hyperactive Data Quality (Second Edition)

A Brave New Data World

August 11, 2011

International Data Quality

August 11, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

On this episode of OCDQ Radio, I discuss the sometimes mysterious world of international name and address data quality, which is why I am pleased to be joined by, not an international man of mystery, but instead, an international man of data quality.

Graham Rhind is an acknowledged expert in the field of data quality. Graham runs GRC Database Information, a consultancy company based in The Netherlands, where he researches postal code and addressing systems, collates international data, runs a busy postal link website, writes data management software, and maintains an online Data Quality Glossary.

Graham Rhind speaks regularly on the subject and is the author of four books on the topic of international data management, including The Global Source Book for Name and Address Data Management, which has been an invaluable resource for me.

On this episode of OCDQ Radio, Graham Rhind and I discusses the international challenges of postal address and person name data quality, including its implications for web forms and other data entry interfaces.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

August 09, 2011

The Diderot Effect of New Technology

August 09, 2011/ Jim Harris

This blog post is sponsored by the Enterprise CIO Forum and HP.

In his essay Regrets on Parting with My Old Dressing Gown, the 18th century French philosopher Denis Diderot described what is now referred to as the Diderot Effect.

After Diderot was given the gift of an elegant scarlet robe, he not only parted with his old dressing gown, but he also realized that his new robe clashed with his scruffy old study. Therefore, he started replacing more and more of his study. First, he replaced his old desk, then he replaced the tapestry, and eventually he replaced all of the furniture until the elegance of his study matched the elegance of his new robe.

I have recently fallen prey to what I refer to as the Diderot Effect of New Technology.

Regrets on Parting with My Old Laptop Computer

A few months ago, after finally succumbing to the not-so-subtle pressure from my friend, fellow technology writer, and Mac guy, Phil Simon, I purchased a MacBook Air.

Now, of course, there was absolutely nothing wrong with my three-year-old Dell Latitude laptop computer. It provided a sufficient amount of memory, speed, and storage. Its applications for writing and blogging, web browsing and social networking, as well as audio and video editing were productively supporting my daily business activities. Additionally, all of my peripherals (printer/scanner, flat screen monitor, microphone, speakers) were also getting their jobs done quite nicely, thank you very much.

However, as soon as the elegant, but not scarlet, MacBook Air was introduced into my scruffy old home office, the Diderot Effect began, well, affecting my perception of the technology that I was using on a daily basis.

Initially, I continued to use my Dell for my daily business activities, and dedicated only a small amount of work time to becoming accustomed to using my new MacBook. (I had once been an Apple affectionado, but it had been 10 years since I owned a Mac).

But it didn’t take long before I would have to describe myself as, to paraphrase the 19th century American poet Emily Dickinson, inebriate of MacBook Air am I.

(For the less poetically-minded reader, that’s just a fancy way of saying that I became addicted to using my new MacBook Air.)

So, much like Diderot before me, I have begun replacing more and more of my home office. The only difference being that I am trying to match the elegance (and, yes, of course, also the powerful and easy-to-use functionality) of my new technology.

The Diderot Effect of New Technology

The consumerization of IT has become a significant contributing factor to the increasingly rapid pace at which new technology is introduced into the enterprise. These elegant modern applications seemingly clash with our scruffy old legacy applications, and can evoke a desire to start replacing more and more of the organization’s technology.

However, donning the scarlet robes of new technology can become an expensive endeavor. (The subtitle of Diderot’s essay was “a warning to those who have more taste than fortune.”) Genefa Murphy has blogged that “one of the main reasons for IT debt is the fact that the enterprise is always trying to keep up with the latest and greatest trends, technologies, and changes.”

“In our race to remain competitive,” Murphy concluded, “we have in essence become addicted to the latest and greatest technologies. We need to acknowledge we have a problem before we can take action to rectify it.”

Diderot was able to both acknowledge and take action to rectify his addiction. “Don’t fear that the mad desire to stock up on beautiful things has taken control of me,” he reassures us at the conclusion of his essay.

Hopefully, the mad desire to stock up on new technological things hasn’t taken control of either you or your organization.

This blog post is sponsored by the Enterprise CIO Forum and HP.

The IT Consumerization Conundrum

The IT Prime Directive of Business First Contact

A Sadie Hawkins Dance of Business Transformation

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The Partly Cloudy CIO

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

August 04, 2011

Big Data and Big Analytics

August 04, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Jill Dyché is the Vice President of Thought Leadership and Education at DataFlux. Jill’s role at DataFlux is a combination of best-practice expert, key client advisor and all-around thought leader. She is responsible for industry education, key client strategies and market analysis in the areas of data governance, business intelligence, master data management and customer relationship management. Jill is a regularly featured speaker and the author of several books.

Jill’s latest book, Customer Data Integration: Reaching a Single Version of the Truth (Wiley & Sons, 2006), was co-authored with Evan Levy and shows the business breakthroughs achieved with integrated customer data.

Dan Soceanu is the Director of Product Marketing and Sales Enablement at DataFlux. Dan manages global field sales enablement and product marketing, including product messaging and marketing analysis. Prior to joining DataFlux in 2008, Dan has held marketing, partnership and market research positions with Teradata, General Electric and FormScape, as well as data management positions in the Financial Services sector.

Dan received his Bachelor of Science in Business Administration from Kutztown University of Pennsylvania, as well as earning his Master of Business Administration from Bloomsburg University of Pennsylvania.

On this episode of OCDQ Radio, Jill Dyché, Dan Soceanu, and I discuss the recent Pacific Northwest BI Summit, where the three core conference topics were Cloud, Collaboration, and Big Data, the last of which lead to a discussion about Big Analytics.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

August 02, 2011

Are you turning Ugly Data into Cute Information?

August 02, 2011/ Jim Harris

Sometimes the ways of the data force are difficult to understand precisely because they are sometimes difficult to see.

Daragh O Brien and I were discussing this recently on Twitter, where tweets about data quality and information quality form the midi-chlorians of the data force. Share disturbances you’ve felt in the data force using the #UglyData and #CuteInfo hashtags.

Presentation Quality

Perhaps one of the most common examples of the difference between data and information is the presentation layer created for business users. In her fantastic book Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information, Danette McGilvray defines Presentation Quality as “a measure of how information is presented to, and collected from, those who utilize it. Format and appearance support appropriate use of the information.”

Tom Redman emphasizes the two most important points in the data lifecycle are when data is created and when data is used.

I describe the connection between those two points as the Data-Information Bridge. By passing over this bridge, data becomes the information used to make the business decisions that drive the tactical and strategic initiatives of the organization. Some of the most important activities of enterprise data management actually occur on the Data-Information Bridge, where preventing critical disconnects between data creation and data usage is essential to the success of the organization’s business activities.

Defect prevention and data cleansing are two of the required disciplines of an enterprise-wide data quality program. Defect prevention is focused on the moment of data creation, attempting to enforce better controls to prevent poor data quality at the source. Data cleansing can either be used to compensate for a lack of defect prevention, or it can be included in the processing that prepares data for a specific use (i.e., transforms data into information fit for the purpose of a specific business use.)

The Dark Side of Data Cleansing

In a previous post, I explained that although most organizations acknowledge the importance of data quality, they don’t believe that data quality issues occur very often because the information made available to end users in dashboards and reports often passes through many processes that cleanse or otherwise sanitize the data before it reaches them.

ETL processes that extract source data for a data warehouse load will often perform basic data quality checks. However, a fairly standard practice for “resolving” a data quality issue is to substitute either a missing or default value (e.g., a date stored in a text field in the source, which can not be converted into a valid date value, is loaded with either a NULL value or the processing date).

When postal address validation software generates a valid mailing address, it often does so by removing what it considers to be “extraneous” information from input address fields, which may include valid data accidentally entered in the wrong field, or that was lacking its own input field (e.g., e-mail address in an input address field deleted from the output valid mailing address).

And some reporting processes intentionally filter out “bad records” or eliminate “outlier values.” This happens most frequently when preparing highly summarized reports, especially those intended for executive management.

These are just a few examples of the Dark Side of Data Cleansing, which can turn Ugly Data into Cute Information.

Has your Data Quality turned to the Dark Side?

Like truth, beauty, and singing ability, data quality is in the eyes of the beholder, or since data quality is most commonly defined as fitness for the purpose of use, we could say that data quality is in the eyes of the user. But how do users know if data is truly fit for their purpose, or if they are simply being presented with information that is aesthetically pleasing for their purpose?

Has your data quality turned to the dark side by turning ugly data into cute information?

Data, Information, and Knowledge Management

Beyond a “Single Version of the Truth”

The Data-Information Continuum

The Circle of Quality

Data Quality and the Cupertino Effect

The Idea of Order in Data

Hell is other people’s data

OCDQ Radio - Organizing for Data Quality

The Reptilian Anti-Data Brain

Amazon’s Data Management Brain

Holistic Data Management (Part 3)

Holistic Data Management (Part 2)

Holistic Data Management (Part 1)

OCDQ Radio - Data Governance Star Wars

Data Governance Star Wars: Bureaucracy versus Agility

July 28, 2011

The IT Consumerization Conundrum

July 28, 2011/ Jim Harris

This blog post is sponsored by the Enterprise CIO Forum and HP.

The consumerization of IT is a disruptive force that many organizations are struggling to come to terms with, especially their IT departments. As R "Ray" Wang recently blogged about this challenge, “technologies available to consumers at low cost, or even for free, are increasingly pushing aside enterprise applications. For IT leaders accustomed to having control over corporate technology, this represents a huge challenge — and it’s one they’re not meeting very well.”

Speed and agility are the most common business drivers for implementing new technology. The consumer technology trifecta of cloud computing, SaaS, and mobility has enabled business users to directly purchase off-premises applications that quickly provide only the features they currently need. Meanwhile, on-premises applications, although feature-rich, become user-poor because of their slower time to implement, and less-than-agile reputation for dealing with change requests and customizations.

However, the organization still relies on some of the functionality, and especially the data, provided by legacy applications, which IT is required to continue to support. IT is also responsible for assisting the organization with any technology challenges encountered when using modern applications. This feature fracture (i.e., the technology supporting business needs being splintered across legacy and modern applications) often leaves IT departments overburdened, and causes them to battle against the disruptive force of business-driven consumer technology.

“IT and business leaders need to work together and operate in parallel,” Wang concludes. “If IT slows down the business capability to innovate, then the company will suffer as new business models emerge and infrastructure will fail to keep up. If business moves ahead of IT in technology, then the company fails because IT will spend years cleaning up technology messes.”

This is the IT Consumerization Conundrum. Although, in the short-term, it usually better services the technology needs of the organization, in the long-term, if it’s not properly managed and integrated into the IT Delivery strategy of the organization, then it can create a complex web of technology that entangles the organization much more than it enables it.

Or to borrow the words of Ralph Loura, it can “cause technology to become a business disabler instead of a business enabler.”

This blog post is sponsored by the Enterprise CIO Forum and HP.

The IT Prime Directive of Business First Contact

A Sadie Hawkins Dance of Business Transformation

Are Applications the La Brea Tar Pits for Data?

Why does the sun never set on legacy applications?

The Partly Cloudy CIO

The IT Pendulum and the Federated Future of IT

Suburban Flight, Technology Sprawl, and Garage IT

July 27, 2011

Organizing for Data Quality

July 27, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Dr. Thomas C. Redman (the “Data Doc”) is an innovator, advisor and teacher. He was first to extend quality principles to data and information in the late 80s. Since then he has crystallized a body of tools, techniques, roadmaps and organizational insights that help organizations make order-of-magnitude improvements.

More recently Tom has developed keen insights into the nature of data and formulated the first comprehensive approach to “putting data to work.” Taken together, these enable organizations to treat data as assets of virtually unlimited potential.

Tom has personally helped dozens of leaders and organizations better understand data and data quality and start their data programs. He is a sought-after lecturer and the author of dozens of papers and four books. The most recent, Data Driven: Profiting from Your Most Important Business Asset (Harvard Business Press, 2008) was a Library Journal best buy of 2008.

Prior to forming Navesink Consulting Group in 1996, Tom conceived the Data Quality Lab at AT&T Bell Laboratories in 1987 and led it until 1995. Tom holds a Ph.D. in statistics from Florida State University. He holds two patents.

On this episode of OCDQ Radio, Tom Redman and I discuss concepts from his Data Governance and Information Quality 2011 post-conference tutorial about organizing for data quality, which includes his call to action for your role in the data revolution.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

July 26, 2011

Data, Information, and Knowledge Management

July 26, 2011/ Jim Harris

The difference, and relationship, between data and information is a common debate. Not only do these two terms have varying definitions, but they are often used interchangeably. Just a few examples include comparing and contrasting data quality with information quality, data management with information management, and data governance with information governance.

In a previous blog post, I referenced the Information Hierarchy provided by Professor Ray R. Larson of the School of Information at the University of California, Berkeley:

Data – The raw material of information
Information – Data organized and presented by someone
Knowledge – Information read, heard, or seen, and understood
Wisdom – Distilled and integrated knowledge and understanding

Some consider this an esoteric debate between data geeks and information nerds, but what is not debated is the importance of understanding how organizations use data and/or information to support their business activities. Of particular interest is the organization’s journey from data to decision, the latter of which is usually considered the primary focus of business intelligence.

In his recent blog post, Scott Andrews explained what he called The Information Continuum:

Data – A Fact or a piece of information, or a series thereof
Information – Knowledge discerned from data
Business Intelligence – Information Management pertaining to an organization’s policy or decision-making, particularly when tied to strategic or operational objectives

Knowledge Management

Data Cake
Image by EpicGraphic

This recent graphic does a great job of visualizing the difference between data and information, as well as the importance of how information is presented. Although the depiction of knowledge as consumed information is oversimplified, I am not sure how this particular visual metaphor could properly represent knowledge as actually understanding the consumed information.

It’s been awhile since the term knowledge management was in vogue within the data management industry. When I began my career, in the early 1990s, I remember hearing about knowledge management as often as we hear about data governance today, which, as you know, is quite often. The reason I have resurrected the term in this blog post is because I can’t help but wonder if the debate about data and information obfuscates the fact that the organization’s appetite, its business hunger, is for knowledge.

Three Questions for You

Does your organization make a practical distinction between data and information?
If so, how does this distinction affect your quality, management, and governance initiatives?
What is the relationship between those initiatives and your business intelligence efforts?

Please share your thoughts and experiences by posting a comment below.

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Data In, Decision Out

The Data-Decision Symphony

Data Confabulation in Business Intelligence

Thaler’s Apples and Data Quality Oranges

DQ-View: Baseball and Data Quality

Beyond a “Single Version of the Truth”

The Business versus IT—Tear down this wall!

Finding Data Quality

Fantasy League Data Quality

The Circle of Quality

July 23, 2011

Seventeen Syllables about the Seven Letter Tsunami

July 23, 2011/ Jim Harris

“Business gets smarter
As the Data gets bigger
As the World gets small”

~ A Haiku about Big Data
Inspired by Jill Dyché

A Brave New Data World

Thaler’s Apples and Data Quality Oranges

Data Confabulation in Business Intelligence

Data In, Decision Out

The Data-Decision Symphony

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Data, data everywhere, but where is data quality?

Finding Data Quality

Data!

You Can’t Always Get the Data You Want

To Our Data Perfectionists

July 21, 2011

The Age of the Platform

July 21, 2011/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Phil Simon is the author of three books: The New Small (Motion, 2010), Why New Systems Fail (Cengage, 2010) and The Next Wave of Technologies (John Wiley & Sons, 2010).

A recognized technology expert, he consults companies on how to optimize their use of technology. His contributions have been featured on The Globe and Mail, the American Express Open Forum, ComputerWorld, ZDNet, abcnews.com, forbes.com, The New York Times, ReadWriteWeb, and many other sites.

When not fiddling with computers, hosting podcasts, putting himself in comics, and writing, Phil enjoys English Bulldogs, tennis, golf, movies that hurt the brain, fantasy football, and progressive rock—which is also the subject of this episode’s book contest (see below).

On this episode of OCDQ Radio, Phil and I discuss his fourth book, The Age of the Platform, which will be published later this year thanks to the help of the generous contributions of people like you who are backing the book’s Kickstarter project.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Gaining a Competitive Advantage with Data — Guest William McKnight discusses some of the practical, hands-on guidance provided by his book Information Management: Strategies for Gaining a Competitive Advantage with Data.

Doing Data Governance — Guest John Ladley discusses his book How to Design, Deploy and Sustain Data Governance and how to understand the difference and relationship between data governance and enterprise information management.

Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).

Measuring Data Quality for Ongoing Improvement — Guest Laura Sebastian-Coleman discusses bringing together a better understanding of what is represented in data with the expectations for use in order to improve the overall quality of data.

The Blue Box of Information Quality — Guest Daragh O Brien on why Information Quality is bigger on the inside, using stories as an analytical tool and change management technique, and why we must never forget that “people are cool.”

Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses Data Quality and Business Intelligence, including the speed versus quality debate of near-real-time decision making, and the future of predictive analytics.

The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Art of Data Matching — Guest Henrik Liliendahl Sørensen discusses data matching concepts and practices, including different match techniques, candidate selection, presentation of match results, and business applications of data matching.

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

July 19, 2011

Data Quality Mischief Managed

July 19, 2011/ Jim Harris

Even if you are not a fan of Harry Potter (i.e., you’re a Muggle who hasn’t either read the books or at least seen the movies), you’re probably aware the film franchise concludes this summer.

As I have discussed in my blog post Data Quality Magic, data quality tools are not magic in and of themselves, but like the wands in the wizarding world of Harry Potter, they channel the personal magic force of the wizards or witches who wield them. In other words, the magic in the wizarding world of data quality comes from the people working on data quality initiatives.

Extending the analogy, data quality methodology is like the books of spells and potions in Harry Potter, which are also not magic in and of themselves, but again require people through which to channel their magical potential. And the importance of having people who are united by trust, cooperation, and collaboration is the data quality version of the Order of the Phoenix, with the Data Geeks battling against the Data Eaters (i.e., the dark wizards, witches, spells, and potions that are perpetuating the plague of poor data quality throughout the organization).

And although data quality doesn’t have a Marauder’s Map (nor does it usually require you to recite the oath: “I solemnly swear that I am up to no good”), sometimes the journey toward getting your organization’s data quality mischief managed feels like you’re on a magical quest.

Data Quality Magic

Data Quality is not a Magic Trick

Do you believe in Magic (Quadrants)?

There are no Magic Beans for Data Quality

The Tooth Fairy of Data Quality

Video: Oh, the Data You’ll Show!

Data Quality and #FollowFriday the 13th

Dilbert, Data Quality, Rabbits, and #FollowFriday

Spartan Data Quality

Pirates of the Computer: The Curse of the Poor Data Quality

The Tell-Tale Data

Data Quality is People!