OCDQ Blog

Will Big Data be Blinded by Data Science?

Talking Business about the Weather

The Symbiotic Relationship of Cloud and Mobile

Cloud Benefits for Midsize Businesses

Barriers to Cloud Adoption

Leveraging the Cloud for Application Development

Cloud Computing for Midsize Businesses

Cloud Computing is the New Nimbyism

Devising a Mobile Device Strategy

The Age of the Mobile Device

Social Business is more than Social Marketing

Social Media Marketing: From Monologues to Dialogues

Social Media for Midsize Businesses

Information Asymmetry versus Empowered Customers

August 06, 2013

Big Data is Just Another Brick in the Wall

August 06, 2013/ Jim Harris

The title of my recent blog post Chaos in the Big Data Brickyard made Mike Wheeler think it was a reference to the Indianapolis Motor Speedway, which is known as “The Brickyard” because it was paved entirely with bricks way back in 1909 (today, three feet of the original bricks remain at the start/finish line). This was a reasonable assumption by Wheeler since he is a NASCAR fan (thus making his last name a great example of an aptronym) and thus prompted his blog post Yeah, But Who Won The Race?

“The term brickyard taken without any context,” Wheeler explained, “turned out to be another random brick of fact laid on an already crowded foundation. Context is what provides relevance to facts. Without a frame of reference into which a fact can be inserted it can easily become meaningless or, even worse, detrimental to the decision-making process.”

As usual, I agree with Wheeler (except about being a NASCAR fan — my apologies to Mike and his fellow auto racing fans).

In my post Big Data, Sporks, and Decision Frames, I blogged about how having the right decision frame (i.e., understanding the business context of a decision) is essential to whether big data and data science can provide meaningful business insight.

Additional context often missing from discussions about big data and data science is that they are not the only bricks in the yard.

Data modeling is still important and data quality still matters. As does metadata, data management and business intelligence, data monitoring, communication, collaboration, change management and the many other aspects of data governance.

“A successful man,” David Brinkley once said, “is one who can lay a firm foundation with the bricks others have thrown at him.” A successful big data initiative is one that can lay a firm foundation with the bricks of best practices that the data management industry has been rightfully throwing at us for a long time now. Big data does not obviate the need for those best practices — even though it does occasionally require adapting our best practices as well as adopting new practices.

Big data is not the be all and end all, as it is sometimes overhyped, but instead, to paraphrase the great philosophers Pink Floyd:

All in all, big data is just another brick in the wall.

Related Podcasts

Clicking on the link will take you to the episode’s blog post:

Defining Big Data — This episode of the Open MIKE Podcast, with assistance from Robert Hillard, discusses how big data refers to big complexity, not big volume, even though complex datasets tend to grow rapidly, thus making them voluminous.

Too Big to Ignore — Guest Phil Simon, author of the book Too Big to Ignore: The Business Case for Big Data, offers advice on getting started with big data and remembering that big data is just another means toward solving business problems.

Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.

Demystifying Data Science — Guest Melinda Thielbar, a Ph.D. Statistician, discusses what a data scientist does and provides a straightforward explanation of key concepts such as signal-to-noise ratio, uncertainty, experimentation, and correlation.

i blog of Data glad and big

The Laugh-In Effect of Big Data

The Need for Data Philosophers

The Wisdom of Crowds, Friends, and Experts

Magic Elephants, Data Psychics, and Invisible Gorillas

Information Overload Revisited

Our Increasingly Data-Constructed World

Big Data and the Infinite Inbox

Little Ooches prevent Big Data Ouches

It’s Not about being Data-Driven

Data Separates Science from Superstition

Through a PRISM, Darkly

Predictive Analytics, the Data Effect, and Jed Clampett

Rage against the Machines Learning

Darth Vader, Big Data, and Predictive Analytics

The Flying Monkeys of Big Data

Big Data, Sporks, and Decision Frames

Big Data, Predictive Analytics, and the Ideal Chronicler

July 09, 2013

The Assumption of Quality

July 09, 2013/ Jim Harris

In my post The Wisdom of Crowds, Friends, and Experts, I used Amazon, Facebook, and Pandora respectively as examples of three techniques used by the recommendation engines increasingly provided by websites, social networks, and mobile apps.

Richard Jarvis commented that my assessment of the data quality associated with these techniques actually needed to look at metadata, data, and information, as well as knowledge management. “For crowd-sourced data, we’re assessing quality based on the first-order value rather than the immediate downstream usability. We’re not questioning the accuracy of Amazon’s assertion that customers who purchased X also purchased Y. Rather, we’re interested in the relevance of that information. In terms of knowledge management, I would describe this as broadening data quality to embrace information and knowledge quality.”

As usual, I agreed with Richard. Amazon is not providing access to the operational data underlying their recommendations, which is, of course, understandable, but instead Amazon is providing some aggregated information (e.g., sales rank) along with some detailed information (e.g., consumer reviews) and numerous metadata attributes (e.g., product category).

We have no way of knowing if the underlying operational data is accurate (as well as other aspects of data quality), nor do we have any way of verifying any aspect of the information quality. Some of the metadata could be verified by cross-referencing other sources (e.g., for books, we could verify the metadata with the publishers and other sellers such as Barnes & Noble).

Making use of Amazon’s information has to be done on the assumption of quality — something that data and information quality professionals would never endorse in other contexts (e.g., within Amazon’s internal financial accounting systems).

While this situation has always existed, the Internet and the era of big data is exacerbating it. Although this example focused on recommendation engines, many of the sources involved in big data analytics face this same challenge, such as sentiment analysis and other analysis that is dependent upon self-reported data.

Furthermore, I would argue that many, for lack of a better term, traditional data and information management applications, have functioned off of the same assumption of quality even though data and information quality best practices are implemented.

By extension, although there is an assumption that quality business decisions can only be made based on quality metadata, data, and information, if that were true in all cases, then every business would be bankrupt.

None of this is meant to imply that quality is not important.

On the contrary, my point is that in almost every application of metadata, data, and information, there is an assumption of quality. Obviously, this assumption should be tested whenever it can be, but we have to accept the fact that there will be many times when we will not be able to, thus forcing us to leverage metadata, data, and information on the assumption of their quality.

May 30, 2013

Business Analytics for Midsize Businesses

May 30, 2013/ Jim Harris

As this growing list of definitions for big data attests, big data evangelist and IBM thought leader James Kobielus rightfully warns that big data is in danger of definitional overkill. But most midsize business owners are less concerned about defining big data as they are about, as Laurie McCabe recently blogged, determining whether big data is relevant for their business.

“The fact of the matter is, big is a relative term,” McCabe explained, “relative to the amount of information that your organization needs to sift through to find the insights you need to operate the business more proactively and profitably.”

McCabe also noted that this is not just a problem for big businesses, since getting better insights from the data you already have is a challenge for businesses of all sizes. Midsize businesses “may not be dealing with terabytes of data,” McCabe explained, “but many are finding that tools that used to suffice—such as Excel spreadsheets—fall short even when it comes to analyzing internal transactional databases.” McCabe also provided recommendations for how midsize businesses can put big data to work.

The recent IBM study The Case for Business Analytics in Midsize Firms lists big data as one of the trends making a compelling case for the growing importance of business analytics for midsize businesses. The study also noted that important functional data continues to live in departmental spreadsheets, and state-of-the-art business analytics solutions are needed to make it easy to pull all that data, along with data from other sources, together in a meaningful way. Despite the common misconception that such solutions are too expensive for midsize businesses, solutions are now available that can deliver analytics capabilities to help overcome big data challenges without requiring a big upfront investment in hardware or software.

Phil Simon, author of Too Big to Ignore: The Business Case for Big Data, recently blogged about reporting versus analytics, explaining the essence of analytics is it goes beyond the what and where provided by reporting, and tries to explain the why.

Big data isn’t the only reason why analytics is becoming more of a necessity. But with the barriers to what it costs and where it can be deployed becoming easier to overcome, business analytics is becoming more commonplace in midsize businesses.

To participate in the 2013 IBM Big Data Study launching in June, register via the following link: http://goo.gl/dkf0H

The Big Datastillery

Smart Big Data Adoption for Midsize Businesses

Big Data is not just for Big Businesses

Big Data Lessons from Orbitz

Will Big Data be Blinded by Data Science?

The Laugh-In Effect of Big Data

Social Business is more than Social Marketing

Social Media Marketing: From Monologues to Dialogues

Social Media for Midsize Businesses

Business Intelligence for Midsize Businesses

Barriers to Cloud Adoption

Leveraging the Cloud for Application Development

Cloud Computing for Midsize Businesses

Cloud Computing is the New Nimbyism

Devising a Mobile Device Strategy

The Age of the Mobile Device

Information Asymmetry versus Empowered Customers

Talking Business about the Weather

April 30, 2013

Business Intelligence for Midsize Businesses

April 30, 2013/ Jim Harris

Business intelligence is one of those phrases that everyone agrees is something all organizations, regardless of their size, should be doing. After all, no organization would admit to doing business stupidity. Nor, I presume, would any vendor admit to selling it.

But not everyone seems to agree on what the phrase means. Personally, I have always defined business intelligence as the data analytics performed in support of making informed business decisions (i.e., for me, business intelligence = decision support).

Oftentimes, this analytics is performed on data integrated, cleansed, and consolidated into a repository (e.g., a data warehouse). Other times, it’s performed on a single data set (e.g., a customer information file). Either way, business decision makers interact with the analytical results via static reports, data visualizations, dynamic dashboards, and ad hoc querying and reporting tools.

But robust business intelligence and analytics solutions used to be perceived as something only implemented by big businesses, as evinced in the big price tags usually associated with them. However, free and open source software, cloud computing, mobile, social, and a variety of as-a-service technologies drove the consumerization of IT, driving down the costs of solutions, enabling small and midsize businesses to afford them. Additionally, the open data movement lead to a wealth of free public data sets that can be incorporated into business intelligence and analytics solutions (examples can be found at kdnuggets.com/datasets).

Lyndsay Wise, author of the insightful book Using Open Source Platforms for Business Intelligence (to listen to a podcast about the book, click here: OSBI on OCDQ Radio), recently blogged about business intelligence for small and midsize businesses.

Wise advised that “recent market changes have shifted the market in favor of small and midsize businesses. Before this, most were limited by requirements for large infrastructures, high-cost licensing, and limited solution availability. With this newly added flexibility and access to lower price points, business intelligence and analytics solutions are no longer out of reach.”

The Big Datastillery

Smart Big Data Adoption for Midsize Businesses

Big Data is not just for Big Businesses

Big Data Lessons from Orbitz

Will Big Data be Blinded by Data Science?

Social Business is more than Social Marketing

Social Media Marketing: From Monologues to Dialogues

Social Media for Midsize Businesses

Barriers to Cloud Adoption

Leveraging the Cloud for Application Development

Cloud Computing for Midsize Businesses

Cloud Computing is the New Nimbyism

Devising a Mobile Device Strategy

The Age of the Mobile Device

Information Asymmetry versus Empowered Customers

Talking Business about the Weather

January 10, 2013

Open Source Business Intelligence

January 10, 2013/ Jim Harris

OCDQ Radio is a vendor-neutral podcast about data quality and its related disciplines, produced and hosted by Jim Harris.

During this episode, I discuss open source business intelligence (OSBI) with Lyndsay Wise, author of the insightful new book Using Open Source Platforms for Business Intelligence: Avoid Pitfalls and Maximize ROI.

Lyndsay Wise is the President and Founder of WiseAnalytics, an independent analyst firm and consultancy specializing in business intelligence for small and mid-sized organizations. For more than ten years, Lyndsay Wise has assisted clients in business systems analysis, software selection, and implementation of enterprise applications.

Lyndsay Wise conducts regular research studies, consults, writes articles, and speaks about how to implement a successful business intelligence approach and improving the value of business intelligence within organizations.

Related OCDQ Radio Episodes

Clicking on the link will take you to the episode’s blog post:

Decision Management Systems — Guest James Taylor discusses data-driven decision making and analytical concepts from his book Decision Management Systems: A Practical Guide to Using Business Rules and Predictive Analytics.

Data Driven — Guest Tom Redman (aka the “Data Doc”) discusses concepts from one of my favorite data quality books, which is his most recent book Data Driven: Profiting from Your Most Important Business Asset.

Making EIM Work for Business — Guest John Ladley discusses his book Making EIM Work for Business, exploring what makes information management, not just useful, but valuable to the enterprise.

The Data Governance Imperative — Guest Steve Sarsfield discusses his book The Data Governance Imperative, exploring the business value of data quality, the characteristics of a data champion, and creating effective data quality scorecards.

Master Data Management in Practice — Guests Dalton Cervo and Mark Allen discuss their book MDM in Practice, demystifying the theories surrounding MDM, and recommending how to properly prepare for a new MDM program.

Studying Data Quality — Guest Gordon Hamilton discusses the key concepts from recommended data quality books, including those which he has implemented in his career as a data quality practitioner.

Good-Enough Data for Fast-Enough Decisions — Guest Julie Hunt discusses the intersection of data quality and business intelligence, especially good-enough data for fast-enough decisions, a necessity in the constantly changing business world.

Big Data and Big Analytics — Special Guests Jill Dyché and Dan Soceanu, following the 2011 Pacific Northwest BI Summit, discuss big trends in business intelligence, including cloud computing, collaboration, and big data analytics.

December 04, 2012

The Wisdom of Crowds, Friends, and Experts

December 04, 2012/ Jim Harris

I recently finished reading the TED Book by Jim Hornthal, A Haystack Full of Needles, which included an overview of the different predictive approaches taken by one of the most common forms of data-driven decision making in the era of big data, namely, the recommendation engines increasingly provided by websites, social networks, and mobile apps.

These recommendation engines primarily employ one of three techniques, choosing to base their data-driven recommendations on the “wisdom” provided by either crowds, friends, or experts.

The Wisdom of Crowds

In his book The Wisdom of Crowds, James Surowiecki explained that the four conditions characterizing wise crowds are diversity of opinion, independent thinking, decentralization, and aggregation. Amazon is a great example of a recommendation engine using this approach by assuming that a sufficiently large population of buyers is a good proxy for your purchasing decisions.

For example, Amazon tells you that people who bought James Surowiecki’s bestselling book also bought Thinking, Fast and Slow by Daniel Kahneman, Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business by Jeff Howe, and Wikinomics: How Mass Collaboration Changes Everything by Don Tapscott. However, Amazon neither provides nor possesses knowledge of why people bought all four of these books or qualification of the subject matter expertise of these readers.

However, these concerns, which we could think of as potential data quality issues, and which would be exacerbated within a small amount of transaction data where the eclectic tastes and idiosyncrasies of individual readers would not help us decide what books to buy, within a large amount of transaction data, we achieve the Wisdom of Crowds effect when, taken in aggregate, we receive a general sense of what books we might like to read based on what a diverse group of readers collectively makes popular.

As I blogged about in my post Sometimes it’s Okay to be Shallow, sometimes the aggregated, general sentiment of a large group of unknown, unqualified strangers will be sufficient to effectively make certain decisions.

The Wisdom of Friends

Although the influence of our friends and family is the oldest form of data-driven decision making, historically this influence was delivered by word of mouth, which required you to either be there to hear those influential words when they were spoken, or have a large enough network of people you knew that would be able to eventually pass along those words to you.

But the rise of social networking services, such as Twitter and Facebook, has transformed word of mouth into word of data by transcribing our words into short bursts of social data, such as status updates, online reviews, and blog posts.

Facebook “Likes” are a great example of a recommendation engine that uses the Wisdom of Friends, where our decision to buy a book, see a movie, or listen to a song might be based on whether or not our friends like it. Of course, “friends” is used in a very loose sense in a social network, and not just on Facebook, since it combines strong connections such as actual friends and family, with weak connections such as acquaintances, friends of friends, and total strangers from the periphery of our social network.

Social influence has never ended with the people we know well, as Nicholas Christakis and James Fowler explained in their book Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives. But the hyper-connected world enabled by the Internet, and further facilitated by mobile devices, has strengthened the social influence of weak connections, and these friends form a smaller crowd whose wisdom is involved in more of our decisions than we may even be aware of.

The Wisdom of Experts

Since it’s more common to associate wisdom with expertise, Pandora is a great example of a recommendation engine that uses the Wisdom of Experts. Pandora used a team of musicologists (professional musicians and scholars with advanced degrees in music theory) to deconstruct more than 800,000 songs into 450 musical elements that make up each performance, including qualities of melody, harmony, rhythm, form, composition, and lyrics, as part of what Pandora calls the Music Genome Project.

As Pandora explains, their methodology uses precisely defined terminology, a consistent frame of reference, redundant analysis, and ongoing quality control to ensure that data integrity remains reliably high, believing that delivering a great radio experience to each and every listener requires an incredibly broad and deep understanding of music.

Essentially, experts form the smallest crowd of wisdom. Of course, experts are not always right. At the very least, experts are not right about every one of their predictions. Nor do experts always agree with other, which is why I imagine that one of the most challenging aspects of the Music Genome Project is getting music experts to consistently apply precisely the same methodology.

Pandora also acknowledges that each individual has a unique relationship with music (i.e., no one else has tastes exactly like yours), and allows you to “Thumbs Up” or “Thumbs Down” songs without affecting other users, producing more personalized results than either the popularity predicted by the Wisdom of Crowds or the similarity predicted by the Wisdom of Friends.

The Future of Wisdom

It’s interesting to note that the Wisdom of Experts is the only one of these approaches that relies on what data management and business intelligence professionals would consider a rigorous approach to data quality and decision quality best practices. But this is also why the Wisdom of Experts is the most time-consuming and expensive approach to data-driven decision making.

In the past, the Wisdom of Crowds and Friends was ignored in data-driven decision making for the simple reason that this potential wisdom wasn’t digitized. But now, in the era of big data, not only are crowds and friends digitized, but technological advancements combined with cost-effective options via open source (data and software) and cloud computing make these approaches quicker and cheaper than the Wisdom of Experts. And despite the potential data quality and decision quality issues, the Wisdom of Crowds and/or Friends is proving itself a viable option for more categories of data-driven decision making.

I predict that the future of wisdom will increasingly become an amalgamation of experts, friends, and crowds, with the data and techniques from all three potential sources of wisdom often acknowledged as contributors to data-driven decision making.

Sometimes it’s Okay to be Shallow

The Wisdom of the Social Media Crowd

Data Management: The Next Generation

Exercise Better Data Management

Darth Vader, Big Data, and Predictive Analytics

Data-Driven Intuition

The Big Data Theory

Finding a Needle in a Needle Stack

Big Data, Predictive Analytics, and the Ideal Chronicler

The Limitations of Historical Analysis

Magic Elephants, Data Psychics, and Invisible Gorillas

OCDQ Radio - Data Quality and Big Data

Big Data: Structure and Quality

The Data-Decision Symphony

OCDQ Radio - Decision Management Systems

A Tale of Two Datas

October 04, 2012

A Tale of Two Datas

October 04, 2012/ Jim Harris

Is big data more than just lots and lots of data? Is big data unstructured and not-so-big data structured? Malcolm Chisholm explored these questions in his recent Information Management column, where he posited that there are, in fact, two datas.

“One type of data,” Chisholm explained, “represents non-material entities in vast computerized ecosystems that humans create and manage. The other data consists of observations of events, which may concern material or non-material entities.”

Providing an example of the first type, Chisholm explained, “my bank account is not a physical thing at all; it is essentially an agreed upon idea between myself, the bank, the legal system, and the regulatory authorities. It only exists insofar as it is represented, and it is represented in data. The balance in my bank account is not some estimate with a positive and negative tolerance; it is exact. The non-material entities of the financial sector are orderly human constructs. Because they are orderly, we can more easily manage them in computerized environments.”

The orderly human constructs that are represented in data, in the stories told by data (including the stories data tell about us and the stories we tell data) is one of my favorite topics. In our increasingly data-constructed world, it’s important to occasionally remind ourselves that data and the real world are not the same thing, especially when data represents non-material entities since, with the possible exception of Makers using 3-D printers, data-represented entities do not re-materialize into the real world.

Describing the second type, Chisholm explained, “a measurement is usually a comparison of a characteristic using some criteria, a count of certain instances, or the comparison of two characteristics. A measurement can generally be quantified, although sometimes it’s expressed in a qualitative manner. I think that big data goes beyond mere measurement, to observations.”

Chisholm called the first type the Data of Representation, and the second type the Data of Observation.

The data of representation tends to be structured, in the relational sense, but doesn’t need to be (e.g., graph databases) and the data of observation tends to be unstructured, but it can also be structured (e.g., the structured observations generated by either a data profiling tool analyzing structured relational tables or flat files, or a word-counting algorithm analyzing unstructured text).

“Structured and unstructured,” Chisholm concluded, “describe form, not essence, and I suggest that representation and observation describe the essences of the two datas. I would also submit that both datas need different data management approaches. We have a good idea what these are for the data of representation, but much less so for the data of observation.”

I agree that there are two types of data (i.e., representation and observation, not big and not-so-big) and that different data uses will require different data management approaches. Although data modeling is still important and data quality still matters, how much data modeling and data quality is needed before data can be effectively used for specific business purposes will vary.

In order to move our discussions forward regarding “big data” and its data management and business intelligence challenges, we have to stop fiercely defending our traditional perspectives about structure and quality in order to effectively manage both the form and essence of the two datas. We also have to stop fiercely defending our traditional perspectives about data analytics, since there will be some data use cases where depth and detailed analysis may not be necessary to provide business insight.

A Tale of Two Datas

In conclusion, and with apologies to Charles Dickens and his A Tale of Two Cities, I offer the following A Tale of Two Datas:

It was the best of times, it was the worst of times.
It was the age of Structured Data, it was the age of Unstructured Data.
It was the epoch of SQL, it was the epoch of NoSQL.
It was the season of Representation, it was the season of Observation.
It was the spring of Big Data Myth, it was the winter of Big Data Reality.
We had everything before us, we had nothing before us,
We were all going direct to hoarding data, we were all going direct the other way.
In short, the period was so far like the present period, that some of its noisiest authorities insisted on its being signaled, for Big Data or for not-so-big data, in the superlative degree of comparison only.

The Idea of Order in Data

The Most August Imagination

Song of My Data

The Lies We Tell Data

Our Increasingly Data-Constructed World

Plato’s Data

OCDQ Radio - Demystifying Master Data Management

OCDQ Radio - Data Quality and Big Data

Big Data: Structure and Quality

Swimming in Big Data

Sometimes it’s Okay to be Shallow

Darth Vader, Big Data, and Predictive Analytics

The Big Data Theory

Finding a Needle in a Needle Stack

Exercise Better Data Management

Magic Elephants, Data Psychics, and Invisible Gorillas

Why Can’t We Predict the Weather?

Data and its Relationships with Quality

A Tale of Two Q’s

A Tale of Two G’s

August 01, 2012

Exercise Better Data Management

August 01, 2012/ Jim Harris

Recently on Twitter, Daragh O Brien and I discussed his proposed concept. “After Big Data,” Daragh tweeted, “we will inevitably begin to see the rise of MOData as organizations seek to grab larger chunks of data and digest it. What is MOData? It’s MO’Data, as in MOre Data. Or Morbidly Obese Data. Only good data quality and data governance will determine which.”

Daragh asked if MO’Data will be the Big Data Killer. I said only if MO’Data doesn’t include MO’BusinessInsight, MO’DataQuality, and MO’DataPrivacy (i.e., more business insight, more data quality, and more data privacy).

“But MO’Data is about more than just More Data,” Daragh replied. “It’s about avoiding Morbidly Obese Data that clogs data insight and data quality, etc.”

I responded that More Data becomes Morbidly Obese Data only if we don’t exercise better data management practices.

Agreeing with that point, Daragh replied, “Bring on MOData and the Pilates of Data Quality and Data Governance.”

To slightly paraphrase lines from one of my favorite movies — Airplane! — the Cloud is getting thicker and the Data is getting laaaaarrrrrger. Surely I know well that growing data volumes is a serious issue — but don’t call me Shirley.

Whether you choose to measure it in terabytes, petabytes, exabytes, HoardaBytes, or how much reality bites, the truth is we were consuming way more than our recommended daily allowance of data long before the data management industry took a tip from McDonald’s and put the word “big” in front of its signature sandwich. (Oh great . . . now I’m actually hungry for a Big Mac.)

But nowadays with silos replicating data, as well as new data, and new types of data, being created and stored on a daily basis, our data is resembling the size of Bob Parr in retirement, making it seem like not even Mr. Incredible in his prime possessed the super strength needed to manage all of our data. Those were references to the movie The Incredibles, where Mr. Incredible was a superhero who, after retiring into civilian life under the alias of Bob Parr, elicits the observation from this superhero costume tailor: “My God, you’ve gotten fat.” Yes, I admit not even Helen Parr (aka Elastigirl) could stretch that far for a big data joke.

A Healthier Approach to Big Data

Although Daragh’s concerns about morbidly obese data are valid, no superpowers (or other miracle exceptions) are needed to manage all of our data. In fact, it’s precisely when we are so busy trying to manage all of our data that we hoard countless bytes of data without evaluating data usage, gathering data requirements, or planning for data archival. It’s like we are trying to lose weight by eating more and exercising less, i.e., consuming more data and exercising less data quality and data governance. As Daragh said, only good data quality and data governance will determine whether we get more data or morbidly obese data.

Losing weight requires a healthy approach to both diet and exercise. A healthy approach to diet includes carefully choosing the food you consume and carefully controlling your portion size. A healthy approach to exercise includes a commitment to exercise on a regular basis at a sufficient intensity level without going overboard by spending several hours a day, every day, at the gym.

Swimming is a great form of exercise, but swimming in big data without having a clear business objective before you jump into the pool is like telling your boss that you didn’t get any work done because you decided to spend all day working out at the gym.

Carefully choosing the data you consume and carefully controlling your data portion size is becoming increasingly important since big data is forcing us to revisit information overload. However, the main reason that traditional data management practices often become overwhelmed by big data is because traditional data management practices are not always the right approach.

We need to acknowledge that some big data use cases differ considerably from traditional ones. Data modeling is still important and data quality still matters, but how much data modeling and data quality is needed before big data can be effectively used for business purposes will vary. In order to move the big data discussion forward, we have to stop fiercely defending our traditional perspectives about structure and quality. We also have to stop fiercely defending our traditional perspectives about analytics, since there will be some big data use cases where depth and detailed analysis may not be necessary to provide business insight.

Better than Big or More

Jim Ericson explained that your data is big enough. Rich Murnane explained that bigger isn’t better, better is better. Although big data may indeed be followed by more data that doesn’t necessarily mean we require more data management in order to prevent more data from becoming morbidly obese data. I think that we just need to exercise better data management.

Big Data and the Infinite Inbox

The Laugh-In Effect of Big Data

The Need for Data Philosophers

OCDQ Radio - Demystifying Data Science

OCDQ Radio - Data Quality and Big Data

A Tale of Two Datas

i blog of Data glad and big

Big Data is Just Another Brick in the Wall

The Wisdom of Crowds, Friends, and Experts