The Big Data Collider

As I mentioned in a previous post, I am reading the book Where Good Ideas Come From by Steven Johnson, which examines recurring patterns in the history of innovation.  The current chapter that I am reading is dispelling the traditional notion of the eureka effect by explaining that the evolution of ideas, like all evolution, stumbles its way toward the next good idea, which inevitably, and not immediately, leads to a significant breakthrough.

One example is how the encyclopedic book Enquire Within Upon Everything, the first edition of which was published in 1856, influenced a young British scientist, who in his childhood in the 1960s was drawn to the “suggestion of magic in the book’s title, and who spent hours exploring this portal to the world of information, along with the wondrous feeling of exploring an immense trove of data.”  His childhood fascination with data and information influenced a personal project that he started in 1980, which ten years later became a professional project while he has working in the Swiss particle physics lab CERN.

The scientist was Tim Berners-Lee and his now famous project created the World Wide Web.

“Journalists always ask me,” Berners-Lee explained, “what the crucial idea was, or what the singular event was, that allowed the Web to exist one day when it hadn’t the day before.  They are frustrated when I tell them there was no eureka moment.”

“Inventing the World Wide Web involved my growing realization that there was a power in arranging ideas in an unconstrained, web-like way.  And that awareness came to me through precisely that kind of process.”

CERN is famous for its Large Hadron Collider that uses high-velocity particle collisions to explore some of the open questions in physics concerning the basic laws governing the interactions and forces among elementary particles in an attempt to understand the deep structure of space and time, and, in particular, the intersection of quantum mechanics and general relativity.

 

The Big Data Collider

While reading this chapter, I stumbled toward an idea about Big Data, which as Gartner Research explains, although the term acknowledges the exponential growth, availability, and use of information in today’s data-rich landscape, it’s about more than just data volume.  Data variety (i.e., structured, semi-structured, and unstructured data, as well as other types of data such as sensor data) and data velocity (i.e., how fast data is being produced and how fast the data must be processed to meet demand) are also key characteristics of Big Data.

David Loshin’s recent blog post about Hadoop and Big Data provides a straightforward explanation and simple example of using MapReduce for not only processing fast-moving large volumes of various data, but also deriving meaningful insights from it.

My idea was how Big Analytics uses the Big Data Collider to allow large volumes of various data particles to bounce off each other in high-velocity collisions.  Although a common criticism of Big Data is that it contains more noise than signal, smashing data particles together in the Big Data Collider may destroy most of the noise in the collision, allowing the signals that survive that creative destruction to potentially reduce into an elementary particle of business intelligence.

Admittedly not the greatest metaphor, but as we enquire within data about everything in the Information Age, I thought that it might be useful to share my idea so that it might stumble its way toward the next good idea by colliding with an idea of your own.

 

Related Posts

OCDQ Radio - Big Data and Big Analytics

OCDQ Radio - Good-Enough Data for Fast-Enough Decisions

OCDQ Radio - A Brave New Data World

Data, Information, and Knowledge Management

Thaler’s Apples and Data Quality Oranges

Data Confabulation in Business Intelligence

Data In, Decision Out

The Data-Decision Symphony

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

Beyond a “Single Version of the Truth”

The General Theory of Data Quality

The Data-Information Continuum

Schrödinger’s Data Quality

Data Governance and the Buttered Cat Paradox

Data Governance and the Adjacent Possible

I am reading the book Where Good Ideas Come From by Steven Johnson, which examines recurring patterns in the history of innovation.  The first pattern Johnson writes about is called the Adjacent Possible, which is a term coined by Stuart Kauffman, and is described as “a kind of shadow future, hovering on the edges of the present state of things, a map of all the ways in which the present can reinvent itself.  Yet it is not an infinite space, or a totally open playing field.  The strange and beautiful truth about the adjacent possible is that its boundaries grow as you explore those boundaries.”

Exploring the adjacent possible is like exploring “a house that magically expands with each door you open.  You begin in a room with four doors, each leading to a new room that you haven’t visited yet.  Those four rooms are the adjacent possible.  But once you open any one of those doors and stroll into that room, three new doors appear, each leading to a brand-new room that you couldn’t have reached from your original starting point.  Keep opening new doors and eventually you’ll have built a palace.”

If it ain’t broke, bricolage it

“If it ain’t broke, don’t fix it” is a common defense of the status quo, which often encourages an environment that stifles innovation and the acceptance of new ideas.  The status quo is like staying in the same familiar and comfortable room and choosing to keep all four of its doors closed.

The change management efforts of data governance often don’t talk about opening one of those existing doors.  Instead they often broadcast the counter-productive message that “everything is so broken, we can’t fix it.”  We need to destroy our existing house and rebuild it from scratch with brand new rooms — and probably with one of those open floor plans without any doors.

Should it really be surprising when this approach to change management is so strongly resisted?

The term bricolage can be defined as making creative and resourceful use of whatever materials are at hand regardless of their original purpose, stringing old parts together to form something radically new, transforming the present into the near future.

“Good ideas are not conjured out of thin air,” explains Johnson, “they are built out of a collection of existing parts.”

The primary reason that the change management efforts of data governance are resisted is because they rely almost exclusively on negative methods—they emphasize broken business and technical processes, as well as bad data-related employee behaviors.

Although these problems exist and are the root cause of some of the organization’s failures, there are also unheralded processes and employees that prevented other problems from happening, which are the root cause of some of the organization’s successes.

It’s important to demonstrate that some data governance policies reflect existing best practices, which helps reduce resistance to change, and so a far more productive change management mantra for data governance is: “If it ain’t broke, bricolage it.”

Data Governance and the Adjacent Possible

As Johnson explains, “in our work lives, in our creative pursuits, in the organizations that employ us, in the communities we inhabit—in all these different environments, we are surrounded by potential new ways of breaking out of our standard routines.”

“The trick is to figure out ways to explore the edges of possibility that surround you.”

Most data governance maturity models describe an organization’s evolution through a series of stages intended to measure its capability and maturity, tendency toward being reactive or proactive, and inclination to be project-oriented or program-oriented.

Johnson suggests that “one way to think about the path of evolution is as a continual exploration of the adjacent possible.”

Perhaps we need to think about the path of data governance evolution as a continual exploration of the adjacent possible, as a never-ending journey which begins by opening that first door, building a palatial data governance program one room at a time.

 

Related Posts

Information Quality Certified Professional

Information Quality Certified Professional (IQCP) is the new certification program from the IAIDQ.  The application deadline for the next certification exam is October 25, 2011.  For more information about IQCP certification, please refer to the following links:

 

Taking the first IQCP exam

A Guest Post written by Gordon Hamilton

I can still remember how galvanized I was by the first email mentions of the IQCP certification and its inaugural examination.  I’d been a member of the IAIDQ for the past year and I saw the first mailings in early February 2011.  It’s funny but my memory of the sequence of events was that I filled out the application for the examination that first night, but going back through my emails I see that I attended several IAIDQ Webinars and followed quite a few discussions on LinkedIn before I finally applied and paid for the exam in mid-March (I still got the early bird discount).

Looking back now, I am wondering why I was so excited about the chance to become certified in data quality.  I know that I had been considering the CBIP and CBAP, from TDWI and IIBA respectively, for more than a year, going so far as to purchase study materials and take some sample exams.  Both the CBIP and CBAP designations fit where my career had been for 20+ years, but the subject areas were now tangential to my focus on information and data quality.

The IQCP certification fit exactly where I hoped my career trajectory was now taking me, so it really did galvanize me to action.

I had been a software and database developer for 20+ years when I caught a bad case of Deming-god worship while contracting at Microsoft in the early 2000s, and it only got worse as I started reading books by Olson, Redman, English, Loshin, John Morris, and Maydanchik on how data quality dovetailed with development methodologies of folks like Kimball and Inmon, which in turn dovetailed with the Lean Six Sigma methods.  I was on the slippery slope to choosing data quality as a career because those gurus of Data Quality, and Quality in general, were explaining, and I was finally starting to understand, why data warehouse projects failed so often, and why the business was often underwhelmed by the information product.

I had 3+ months to study and the resource center on the IAIDQ website had a list of recommended books and articles.  I finally had to live up to my moniker on Twitter of DQStudent.  I already had many of the books recommended by IAIDQ at home but hadn’t read them all yet, so while I waited for Amazon and AbeBooks to send me the books I thought were crucial, I began reading Deming, English, and Loshin.

Of all the books that began arriving on my doorstep, the most memorable was Journey to Data Quality by Richard Wang et al.

That book created a powerful image in my head of the information product “manufactured” by every organization.  That image of the “information product” made the suggestions by the data quality gurus much clearer.  They were showing how to apply quality techniques to the manufacture of Business Intelligence.  The image gave me a framework upon which to hang the other knowledge I was gathering about data quality, so it was easier to keep pushing through the books and articles because each new piece could fit somewhere in that manufacturing process.

I slept well the night before the exam, and gave myself plenty of time to make it to the Castle exam site that afternoon.  I took along several books on data quality, but hardly glanced at them.  Instead I grabbed a quick lunch and then a strong coffee to carry me through the 3 hour exam.  At 50 questions per hour I was very conscious of how long each question was taking me and every 10 questions or so I would check to see if was going to run into time trouble.  It was obvious after 20 questions that I had plenty of time so I began to get into a groove, finishing the exam 30 minutes early, leaving plenty of time to review any questionable answers.

I found the exam eminently fair with no tricky question constructions at all, so I didn’t seem to fall into the over-thinking trap that I sometimes do.  Even better, the exam wasn’t the type that drilled deeper and deeper into my knowledge gaps when I missed a question.  Even though I felt confident that I had passed, I’ve got to tell you that the 6 weeks that the IAIDQ took to determine the passing threshold on this inaugural exam and send out passing notifications were the longest 6 weeks I have spent for a long time.  Now that the passing mark is established, they swear that the notifications will be sent out much faster.

I still feel a warm glow as I think back on achieving IQCP certification.  I am proud to say that I am a data quality consultant and I have the certificate proving the depth and breadth of my knowledge.

Gordon Hamilton is a Data Quality, Data Warehouse, and IQCP certified professional, whose 30 years’ experience in the information business encompasses many industries, including government, legal, healthcare, insurance and financial.

 

Related Posts

Studying Data Quality

The Blue Box of Information Quality

Data, Information, and Knowledge Management

Are you turning Ugly Data into Cute Information?

The Dichotomy Paradox, Data Quality and Zero Defects

The Data Quality Wager

Aristotle, Data Governance, and Lead Rulers

Data governance requires the coordination of a complex combination of a myriad of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data quality remediation, data stewardship, business process optimization, technology enablement, and, perhaps most notably, policy enforcement.

But sometimes this emphasis on enforcing policies makes data governance sound like it’s all about rules.

In their book Practical Wisdom, Barry Schwartz and Kenneth Sharpe use the Nicomachean Ethics of Aristotle as a guide to explain that although rules are important, what is more important is “knowing the proper thing to aim at in any practice, wanting to aim at it, having the skill to figure out how to achieve it in a particular context, and then doing it.”

Aristotle observed the practical wisdom of the craftsmen of his day, including carpenters, shoemakers, blacksmiths, and masons, noting how “their work was not governed by systematically applying rules or following rigid procedures.  The materials they worked with were too irregular, and each task posed new problems.”

“Aristotle was particularly fascinated with how masons used rulers.  A normal straight-edge ruler was of little use to the masons who were carving round columns from slabs of stone and needed to measure the circumference of the columns.”

Unless you bend the ruler.

“Which is exactly what the masons did.  They fashioned a flexible ruler out of lead, a forerunner of today’s tape measure.  For Aristotle, knowing how to bend the rule to fit the circumstance was exactly what practical wisdom was all about.”

Although there’s a tendency to ignore the existing practical wisdom of the organization, successful data governance is not about systematically applying rules or following rigid procedures, and precisely because the dynamic challenges faced, and overcome daily, by business analysts, data stewards, technical architects, and others, exemplify today’s constantly changing business world.

But this doesn’t mean that effective data governance policies can’t be implemented.  It simply means that instead of focusing on who should lead the way (i.e., top-down or bottom-up), we should focus on what the rules of data governance are made of.

Well-constructed data governance policies are like lead rulers—flexible rules that empower us with an understanding of the principle of the policy, and trust us to figure out how best to enforce the policy in a particular context, how to bend the rule to fit the circumstance.  Aristotle knew this was exactly what practical wisdom was all about—data governance needs practical wisdom.

“Tighter rules and regulations, however necessary, are pale substitutes for wisdom,” concluded Schwartz and Sharpe.  “We need rules to protect us from disaster.  But at the same time, rules without wisdom are blind and at best guarantee mediocrity.”

DQ-Tip: “The quality of information is directly related to...”

Data Quality (DQ) Tips is an OCDQ regular segment.  Each DQ-Tip is a clear and concise data quality pearl of wisdom.

“The quality of information is directly related to the value it produces in its application.”

This DQ-Tip is from the excellent book Entity Resolution and Information Quality by John Talburt.

The relationship between data and information, and by extension data quality and information quality, is acknowledged and explored in the book’s second chapter, which includes a brief history of information theory, as well as the origins of many of the phrases frequently used throughout the data/information quality industry, e.g., fitness for use and information product.

Talburt explains that the problem with the fitness-for-use definition for the quality of an information product (IP) is that it “assumes that the expectations of an IP user and the value produced by the IP in its application are both well understood.”

Different users often have different applications for data and information, requiring possibly different versions of the IP, each with a different relative value to the user.  This is why Talburt believes that the quality of information is best defined, not as fitness for use, but instead as the degree to which the information creates value for a user in a particular application.  This allows us to measure the business-driven value of information quality with technology-enabled metrics, which are truly relevant to users.

Talburt believes that casting information quality in terms of business value is essential to gaining management’s endorsement of information quality practices within an organizaiton, and Talburt recommends three keys to success with information quality:

  1. Always relate information quality to business value
  2. Give stakeholders a way to talk about information quality—the vocabulary and concepts
  3. Show them a way to get started on improving information quality—and a vision for sustaining it

 

Related Posts

The Real Data Value is Business Insight

Is your data complete and accurate, but useless to your business?

The Fourth Law of Data Quality

The Role of Data Quality Monitoring in Data Governance

Data Quality Measurement Matters

Studying Data Quality

DQ-Tip: “Undisputable fact about the value and use of data...”

DQ-Tip: “Data quality tools do not solve data quality problems...”

DQ-Tip: “There is no such thing as data accuracy...”

DQ-Tip: “Data quality is primarily about context not accuracy...”

DQ-Tip: “There is no point in monitoring data quality...”

DQ-Tip: “Don't pass bad data on to the next person...”

DQ-Tip: “...Go talk with the people using the data”

DQ-Tip: “Data quality is about more than just improving your data...”

DQ-Tip: “Start where you are...”

Studying Data Quality

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

On this episode, Gordon Hamilton and I discuss data quality key concepts, including those which we have studied in some of our favorite data quality books, and more important, those which we have implemented in our careers as data quality practitioners.

Gordon Hamilton is a Data Quality and Data Warehouse professional, whose 30 years’ experience in the information business encompasses many industries, including government, legal, healthcare, insurance and financial.  Gordon was most recently engaged in the healthcare industry in British Columbia, Canada, where he continues to advise several health care authorities on data quality and business intelligence platform issues.

Gordon Hamilton’s passion is to bring together:

  • Exposure of business rules through data profiling as recommended by Ralph Kimball.

  • Monitoring business rules in the EQTL (Extract-Quality-Transform-Load) pipeline leading into the data warehouse.

  • Managing the business rule violations through systemic and specific solutions within the statistical process control framework of Shewhart/Deming.

  • Researching how to sustain data quality metrics as the “fit for purpose” definitions change faster than the information product process can easily adapt.

Gordon Hamilton’s moniker of DQStudent on Twitter hints at his plan to dovetail his Lean Six Sigma skills and experience with the data quality foundations to improve the manufacture of the “information product” in today’s organizations.  Gordon is a member of IAIDQ, TDWI, and ASQ, as well as an enthusiastic reader of anything pertaining to data.

Gordon Hamilton recently became an Information Quality Certified Professional (IQCP), via the IAIDQ certification program.

Recommended Data Quality Books

By no means a comprehensive list, and listed in no particular order whatsoever, the following books were either discussed during this OCDQ Radio episode, or are otherwise recommended for anyone looking to study data quality and its related disciplines:

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.

The Data Cold War

One of the many things I love about Twitter is its ability to spark ideas via real-time conversations.  For example, while live-tweeting during last week’s episode of DM Radio, the topic of which was how to get started with data governance, I tweeted about the data silo challenges and corporate cultural obstacles being discussed.

I tweeted that data is an asset only if it is a shared asset, across the silos, across the corporate culture, and that, in order to be successful with data governance, organizations must replace the mantra “my private knowledge is my power” with “our shared knowledge empowers us all.”

“That’s very socialist thinking,” Mark Madsen responded.  “Soon we’ll be having arguments about capitalizing over socializing our data.”

To which I responded that the more socialized data is, the more capitalized data can become . . . just ask Google.

“Oh no,” Mark humorously replied, “decades of political rhetoric about socialism to be ruined by a discussion of data!”  And I quipped that discussions about data have been accused of worse, and decades of data rhetoric certainly hasn’t proven very helpful in corporate politics.

 

Later, while ruminating on this light-hearted exchange, I wondered if we actually are in the midst of the Data Cold War.

 

The Data Cold War

The Cold War, which lasted approximately from 1946 to 1991, was the political, military, and economic competition between the Communist World, primarily the former Soviet Union, and the Western world, primarily the United States.  One of the major tenets of the Cold War was the conflicting ideologies of socialism and capitalism.

In enterprise data management, one of the most debated ideologies is whether or not data should be viewed as a corporate asset, especially by the for-profit corporations of capitalism, which is (even before the Cold War began), and will likely forever remain, the world’s dominant economic model.

My earlier remark that data is an asset only if it is a shared asset, across the silos, across the corporate culture, is indicative of the bounded socialist view of enterprise data.  In other words, almost no one in the enterprise data management space is suggesting that data should be shared beyond the boundary of the organization.  In this sense, advocates, including myself, of data governance are advocating socializing data within the enterprise so that data can be better capitalized as a true corporate asset.

This mindset makes sense because sharing data with the world, especially for free, couldn’t possibly be profitable — or could it?

 

The Master Data Management Magic Trick

The genius (and some justifiably ponder if it’s evil genius) of companies like Google and Facebook is they realized how to make money in a free world — by which I mean the world of Free: The Future of a Radical Price, the 2009 book by Chris Anderson.

By encouraging their users to freely share their own personal data, Google and Facebook ingeniously answer what David Loshin calls the most dangerous question in data management: What is the definition of customer?

How do Google and Facebook answer the most dangerous question?

A customer is a product.

This is the first step that begins what I call the Master Data Management Magic Trick.

Instead of trying to manage the troublesome master data domain of customer and link it, through sales transaction data, to the master data domain of product (products, by the way, have always been undeniably accepted as a corporate asset even though product data has not been), Google and Facebook simply eliminate the need for customers (and, by extension, eliminate the need for customer service because, since their product is free, it has no customers) by transforming what would otherwise be customers into the very product that they sell — and, in fact, the only “real” product that they have.

And since what their users perceive as their product is virtual (i.e., entirely Internet-based), it’s not really a product, but instead a free service, which can be discontinued at any time.  And if it was, who would you complain to?  And on what basis?

After all, you never paid for anything.

This is the second step that completes the Master Data Management Magic Trick — a product is a free service.

Therefore, Google and Facebook magically make both their customers and their products (i.e., master data) disappear, while simultaneously making billions of dollars (i.e., transaction data) appear in their corporate bank accounts.

(Yes, the personal data of their users is master data.  However, because it is used in an anonymized and aggregated format, it is not, nor does it need to be, managed like the master data we talk about in the enterprise data management industry.)

 

Google and Facebook have Capitalized Socialism

By “empowering” us with free services, Google and Facebook use the power of our own personal data against us — by selling it.

However, it’s important to note that they indirectly sell our personal data as anonymized and aggregated demographic data.

Although they do not directly sell our individually identifiable information (because, truthfully, it has very limited, and mostly no legal, value, i.e., that would be identity theft), Google and Facebook do occasionally get sued (mostly outside the United States) for violating data privacy and data protection laws.

However, it’s precisely because we freely give our personal data to them, that until, or if, laws are changed to protect us from ourselves, it’s almost impossible to prove they are doing anything illegal (again, their undeniable genius is arguably evil genius).

Google and Facebook are the exact same kind of company — they are both Internet advertising agencies.

They both sell online advertising space to other companies, which are looking to demographically target prospective customers because those companies actually do view people as potential real customers for their own real products.

The irony is that if all of their users stopped using their free service, then not only would our personal data be more private and more secure, but the new revenue streams of Google and Facebook would eventually dry up because, specifically by design, they have neither real customers nor real products.  More precisely, their only real customers (other companies) would stop buying advertising from them because no one would ever see and (albeit, even now, only occasionally) click on their ads.

Essentially, companies like Google and Facebook are winning the Data Cold War because they have capitalized socialism.

In other words, the bottom line is Google and Facebook have socialized data in order to capitalize data as a true corporate asset.

 

Related Posts

Freemium is the future – and the future is now

The Age of the Platform

Amazon’s Data Management Brain

The Semantic Future of MDM

A Brave New Data World

Big Data and Big Analytics

A Farscape Analogy for Data Quality

Organizing For Data Quality

Sharing Data

Song of My Data

Data in the (Oscar) Wilde

The Most August Imagination

Once Upon a Time in the Data

The Idea of Order in Data

Hell is other people’s data

A Farscape Analogy for Data Quality

Farscape was one of my all-time favorite science fiction television shows.  In the weird way my mind works, the recent blog post (which has received great comments) Four Steps to Fixing Your Bad Data by Tom Redman, triggered a Farscape analogy.

“The notion that data are assets sounds simple and is anything but,” Redman wrote.  “Everyone touches data in one way or another, so the tendrils of a data program will affect everyone — the things they do, the way they think, their relationships with one another, your relationships with customers.”

The key word for me was tendrils — like I said, my mind works in a weird way.

 

Moya and Pilot

On Farscape, the central characters of the show travel through space aboard Moya, a Leviathan, which is a species of living, sentient spaceships.  Pilot is a sentient creature (of a species also known as Pilots) with the vast capacity for multitasking that is necessary for the simultaneous handling of the many systems aboard a Leviathan.  The tendrils of a Pilot’s lower body are biologically bonded with the living systems of a Leviathan, creating a permanent symbiotic connection, meaning that, once bonded, a Pilot and a Leviathan can no longer exist independently for more than an hour or so, or both of them will die.

Leviathans were one of the many laudably original concepts of Farscape.  The role of the spaceship in most science fiction is analogous to the role of a boat.  In other words, traveling through space is most often imagined like traveling on water.  However, seafaring vessels and spaceships are usually seen as a technological object providing transportation and life support, but not actually alive in its own right (despite the fact that both types of ship are usually anthropomorphized, and usually as a female).

Because Moya was alive, when she was damaged, she felt pain and needed time to heal.  And because she was sentient, highly intelligent, and capable of communicating with the crew through Pilot (who was the only one who could understand the complexity of the Leviathan language, which was beyond the capability of a universal translator), Moya was much more than just a means of transportation.  In other words, there truly was a symbiotic relationship between, not only Moya and Pilot, but also between Moya and Pilot, and their crew and passengers.

 

Enterprise and Data

(Sorry, my fellow science fiction geeks, but it’s not that Enterprise and that Data.  Perfectly understandable mistake, though.)

Although technically not alive in the biological sense, in many respects, an organization is like a living, sentient organism, and like space and seafaring ships, often anthropomorphized.  An enterprise is much more than just a large organization providing a means of employment and offering products and/or services (and, in a sense, life support to its employees and customers).

As Redman explains in his book Data Driven: Profiting from Your Most Important Business Asset, data is not just the lifeblood of the Information Age, data is essential to everything the enterprise does, from helping it better understand its customers, to guiding its development of better products and/or services, to setting a strategic direction toward achieving its business goals.

So the symbiotic relationship between Enterprise and Data is analogous to the symbiotic relationship between Moya and Pilot.

Data is the Pilot of the Enterprise Leviathan.  The enterprise can not survive without its data.  A healthy enterprise requires healthy data — data of sufficient quality capable of supporting the operational, tactical, and strategic functions of the enterprise.

Returning to Redman’s words, “Everyone touches data in one way or another, so the tendrils of a data program will affect everyone — the things they do, the way they think, their relationships with one another, your relationships with customers.”

So the relationship between an enterprise and its data, and its people, business processes, and technology, is analogous to the relationship between Moya and Pilot, and their crew and passengers.  It is the enterprise’s people, its crew (i.e., employees), who, empowered by high quality data and enabled by technology, optimize business processes for superior corporate performance, thereby delivering superior products and/or services to the enterprise’s passengers (i.e., customers).

 

So why isn’t data viewed as an asset?

So if this deep symbiosis exists, if these intertwined and symbiotic relationships exist, if the tendrils of data are biologically bonded with the complex enterprise ecosystem — then why isn’t data viewed as an asset?

In Data Driven, Redman references the book The Social Life of Information by John Seely Brown and Paul Duguid, who explained that “a technology is never fully accepted until it becomes invisible to those who use it.”  The term informationalization describes the process of building data and information into a product or service.  “When products and services are fully informationalized,” Redman noted, then data, “blends into the background and people do not even think about it anymore.”

Perhaps that is why data isn’t viewed as an asset.  Perhaps data has so thoroughly pervaded the enterprise that it has become invisible to those who use it.  Perhaps it is not an asset because data is invisible to those who are so dependent upon its quality.

 

Perhaps we only see Moya, but not her Pilot.

 

Related Posts

Organizing For Data Quality

Data, data everywhere, but where is data quality?

Finding Data Quality

The Data Quality Wager

Beyond a “Single Version of the Truth”

Poor Data Quality is a Virus

DQ-Tip: “Don't pass bad data on to the next person...”

Retroactive Data Quality

Hyperactive Data Quality (Second Edition)

A Brave New Data World

Big Data and Big Analytics

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Jill Dyché is the Vice President of Thought Leadership and Education at DataFlux.  Jill’s role at DataFlux is a combination of best-practice expert, key client advisor and all-around thought leader.  She is responsible for industry education, key client strategies and market analysis in the areas of data governance, business intelligence, master data management and customer relationship management.  Jill is a regularly featured speaker and the author of several books.

Jill’s latest book, Customer Data Integration: Reaching a Single Version of the Truth (Wiley & Sons, 2006), was co-authored with Evan Levy and shows the business breakthroughs achieved with integrated customer data.

Dan Soceanu is the Director of Product Marketing and Sales Enablement at DataFlux.  Dan manages global field sales enablement and product marketing, including product messaging and marketing analysis.  Prior to joining DataFlux in 2008, Dan has held marketing, partnership and market research positions with Teradata, General Electric and FormScape, as well as data management positions in the Financial Services sector.

Dan received his Bachelor of Science in Business Administration from Kutztown University of Pennsylvania, as well as earning his Master of Business Administration from Bloomsburg University of Pennsylvania.

On this episode of OCDQ Radio, Jill Dyché, Dan Soceanu, and I discuss the recent Pacific Northwest BI Summit, where the three core conference topics were Cloud, Collaboration, and Big Data, the last of which lead to a discussion about Big Analytics.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Are you turning Ugly Data into Cute Information?

Sometimes the ways of the data force are difficult to understand precisely because they are sometimes difficult to see.

Daragh O Brien and I were discussing this recently on Twitter, where tweets about data quality and information quality form the midi-chlorians of the data force.  Share disturbances you’ve felt in the data force using the #UglyData and #CuteInfo hashtags.

 

Presentation Quality

Perhaps one of the most common examples of the difference between data and information is the presentation layer created for business users.  In her fantastic book Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information, Danette McGilvray defines Presentation Quality as “a measure of how information is presented to, and collected from, those who utilize it.  Format and appearance support appropriate use of the information.”

Tom Redman emphasizes the two most important points in the data lifecycle are when data is created and when data is used.

I describe the connection between those two points as the Data-Information Bridge.  By passing over this bridge, data becomes the information used to make the business decisions that drive the tactical and strategic initiatives of the organization.  Some of the most important activities of enterprise data management actually occur on the Data-Information Bridge, where preventing critical disconnects between data creation and data usage is essential to the success of the organization’s business activities.

Defect prevention and data cleansing are two of the required disciplines of an enterprise-wide data quality program.  Defect prevention is focused on the moment of data creation, attempting to enforce better controls to prevent poor data quality at the source.  Data cleansing can either be used to compensate for a lack of defect prevention, or it can be included in the processing that prepares data for a specific use (i.e., transforms data into information fit for the purpose of a specific business use.)

 

The Dark Side of Data Cleansing

In a previous post, I explained that although most organizations acknowledge the importance of data quality, they don’t believe that data quality issues occur very often because the information made available to end users in dashboards and reports often passes through many processes that cleanse or otherwise sanitize the data before it reaches them.

ETL processes that extract source data for a data warehouse load will often perform basic data quality checks.  However, a fairly standard practice for “resolving” a data quality issue is to substitute either a missing or default value (e.g., a date stored in a text field in the source, which can not be converted into a valid date value, is loaded with either a NULL value or the processing date).

When postal address validation software generates a valid mailing address, it often does so by removing what it considers to be “extraneous” information from input address fields, which may include valid data accidentally entered in the wrong field, or that was lacking its own input field (e.g., e-mail address in an input address field deleted from the output valid mailing address).

And some reporting processes intentionally filter out “bad records” or eliminate “outlier values.”  This happens most frequently when preparing highly summarized reports, especially those intended for executive management.

These are just a few examples of the Dark Side of Data Cleansing, which can turn Ugly Data into Cute Information.

 

Has your Data Quality turned to the Dark Side?

Like truth, beauty, and singing ability, data quality is in the eyes of the beholder, or since data quality is most commonly defined as fitness for the purpose of use, we could say that data quality is in the eyes of the user.  But how do users know if data is truly fit for their purpose, or if they are simply being presented with information that is aesthetically pleasing for their purpose?

Has your data quality turned to the dark side by turning ugly data into cute information?

 

Related Posts

Data, Information, and Knowledge Management

Beyond a “Single Version of the Truth”

The Data-Information Continuum

The Circle of Quality

Data Quality and the Cupertino Effect

The Idea of Order in Data

Hell is other people’s data

OCDQ Radio - Organizing for Data Quality

The Reptilian Anti-Data Brain

Amazon’s Data Management Brain

Holistic Data Management (Part 3)

Holistic Data Management (Part 2)

Holistic Data Management (Part 1)

OCDQ Radio - Data Governance Star Wars

Data Governance Star Wars: Bureaucracy versus Agility

Organizing for Data Quality

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Dr. Thomas C. Redman (the “Data Doc”) is an innovator, advisor and teacher.  He was first to extend quality principles to data and information in the late 80s.  Since then he has crystallized a body of tools, techniques, roadmaps and organizational insights that help organizations make order-of-magnitude improvements.

More recently Tom has developed keen insights into the nature of data and formulated the first comprehensive approach to “putting data to work.”  Taken together, these enable organizations to treat data as assets of virtually unlimited potential.

Tom has personally helped dozens of leaders and organizations better understand data and data quality and start their data programs.  He is a sought-after lecturer and the author of dozens of papers and four books.  The most recent, Data Driven: Profiting from Your Most Important Business Asset (Harvard Business Press, 2008) was a Library Journal best buy of 2008.

Prior to forming Navesink Consulting Group in 1996, Tom conceived the Data Quality Lab at AT&T Bell Laboratories in 1987 and led it until 1995.  Tom holds a Ph.D. in statistics from Florida State University.  He holds two patents.

On this episode of OCDQ Radio, Tom Redman and I discuss concepts from his Data Governance and Information Quality 2011 post-conference tutorial about organizing for data quality, which includes his call to action for your role in the data revolution.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

The Age of the Platform

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

Phil Simon is the author of three books: The New Small (Motion, 2010), Why New Systems Fail (Cengage, 2010) and The Next Wave of Technologies (John Wiley & Sons, 2010).

A recognized technology expert, he consults companies on how to optimize their use of technology.  His contributions have been featured on The Globe and Mail, the American Express Open Forum, ComputerWorld, ZDNet, abcnews.com, forbes.com, The New York Times, ReadWriteWeb, and many other sites.

When not fiddling with computers, hosting podcasts, putting himself in comics, and writing, Phil enjoys English Bulldogs, tennis, golf, movies that hurt the brain, fantasy football, and progressive rock—which is also the subject of this episode’s book contest (see below).

On this episode of OCDQ Radio, Phil and I discuss his fourth book, The Age of the Platform, which will be published later this year thanks to the help of the generous contributions of people like you who are backing the book’s Kickstarter project.

Popular OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

  • Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, and correlation.
  • Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
  • Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
  • Data Governance Star Wars — Special Guests Rob Karel and Gwen Thomas joined this extended, and Star Wars themed, discussion about how to balance bureaucracy and business agility during the execution of data governance programs.
  • The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
  • Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Data Quality Mischief Managed

Even if you are not a fan of Harry Potter (i.e., you’re a Muggle who hasn’t either read the books or at least seen the movies), you’re probably aware the film franchise concludes this summer.

As I have discussed in my blog post Data Quality Magic, data quality tools are not magic in and of themselves, but like the wands in the wizarding world of Harry Potter, they channel the personal magic force of the wizards or witches who wield them.  In other words, the magic in the wizarding world of data quality comes from the people working on data quality initiatives.

Extending the analogy, data quality methodology is like the books of spells and potions in Harry Potter, which are also not magic in and of themselves, but again require people through which to channel their magical potential.  And the importance of having people who are united by trust, cooperation, and collaboration is the data quality version of the Order of the Phoenix, with the Data Geeks battling against the Data Eaters (i.e., the dark wizards, witches, spells, and potions that are perpetuating the plague of poor data quality throughout the organization).

And although data quality doesn’t have a Marauder’s Map (nor does it usually require you to recite the oath: “I solemnly swear that I am up to no good”), sometimes the journey toward getting your organization’s data quality mischief managed feels like you’re on a magical quest.

 

Related Posts

Data Quality Magic

Data Quality is not a Magic Trick

Do you believe in Magic (Quadrants)?

There are no Magic Beans for Data Quality

The Tooth Fairy of Data Quality

Video: Oh, the Data You’ll Show!

Data Quality and #FollowFriday the 13th

Dilbert, Data Quality, Rabbits, and #FollowFriday

Spartan Data Quality

Pirates of the Computer: The Curse of the Poor Data Quality

The Tell-Tale Data

Data Quality is People!