My selections were based on a pseudo-scientific, quasi-statistical combination of page views, comments, and re-tweets, as well as choosing a few of my personal favorites, and which I have organized into four sections of ten best posts by topic or type.
Ten Best Posts on Big Data
- Dot Collectors and Dot Connectors — The multifaceted challenges of big data require the dot collectors of data management and the dot connectors of business intelligence to overcome their attention blindness and work together more collaboratively.
- HoardaBytes and the Big Data Lebowski — Don’t hoard Data, dude. The Data must abide. The Data must abide both the Business, by proving useful to our business activities, and the Individual, by protecting the privacy of our personal activities.
- Magic Elephants, Data Psychics, and Invisible Gorillas — As technological advancements improve our data analytical tools, we must not lose sight of the fact that tools and data remain only as effective and beneficent as the humans who wield them.
- Our Increasingly Data-Constructed World — What we now call Big Data is in fact a long-running macro trend underlying the many recent trends and innovations making our world, not just more data-driven, but increasingly data-constructed.
- Will Big Data be Blinded by Data Science? — With apologies to Thomas Dolby, will the business leaders being told to hire data scientists to derive business value from big data analytics be blind to what data science tries to show them?
- The Graystone Effects of Big Data — Using a metaphor based on the science fiction television show Caprica, I refer to the positive aspects of Big Data as the Zoe Graystone Effect, and the negative aspects of Big Data as the Daniel Graystone Effect.
- Exercise Better Data Management — Big Data may be followed by MOData (i.e., MOre Data or Morbidly Obese Data), but that doesn’t necessarily mean we require more data management, instead we just need to exercise better data management.
- A Tale of Two Datas — Inspired by Malcolm Chisholm and Charles Dickens, there are two types of data (i.e., representation and observation, not big and not-so-big) with different data uses that will require different data management approaches.
- Data Silence — Not only do we need to adopt a mindset that embraces the principles of data science, but we also have to acknowledge that the biases and preconceptions in our minds could silence the signal and amplify the noise in big data.
- The Wisdom of Crowds, Friends, and Experts — The future of wisdom will increasingly become an amalgamation of experts, friends, and crowds, with the data and techniques from all three sources often contributing to data-driven decision making.
Ten Best Posts on Data Governance and Data Quality
- Data Governance Frameworks are like Jigsaw Puzzles — Inspired by Jill Dyché and Scott Berkun, this post explains how the usefulness of data governance frameworks comes from realizing data governance frameworks are like jigsaw puzzles.
- Data Quality: Quo Vadimus? — With lots of help from Henrik Liliendahl Sørensen, Garry Ure, Bryan Larkin, and many others via the comments, I ponder where data quality is going, and whether data quality is a journey or a destination.
- Data Quality and Miracle Exceptions — Battling the dark forces of poor data quality doesn’t require any superpowers, and data quality doesn’t have any miracle exceptions, so for the love of high-quality data everywhere, stop trying to sell us one.
- Data Myopia and Business Relativity — Examines the two most prevalent definitions for data quality, real-world alignment and fitness for the purpose of use, otherwise known as the danger of data myopia and the challenge of business relativity.
- How Data Cleansing Saves Lives — Although proactive defect prevention is far superior to reactive data cleansing, the history of the Hubble Space Telescope proves that data cleansing can be not just a necessary evil, but also a necessary good.
- Data Quality and the Bystander Effect — The most common reason data quality issues are neither reported nor corrected is the Bystander Effect making people less likely to interpret bad data as a problem or, at the very least, not their responsibility.
- Data Quality and Chicken Little Syndrome — A chicken-metaphor-based post about the far-too-common and fowl folly of, instead of trying to sell the business benefits of data quality, emphasizing the negative aspects of not investing in data quality.
- Data and its Relationships with Quality — The metadata linking the data management industry to what it manages suffers from the one-to-many relationships created by never agreeing on how data, information, and quality should be defined.
- Cooks, Chefs, and Data Governance — Implementing policies requires cooks who are adept at carrying out a recipe, as well as chefs who are trusted to figure out how to best combine policies with the organizational ingredients available to them.
- Availability Bias and Data Quality Improvement — The availability heuristic explains why a reactive data cleansing project is easily approved, and availability bias explains why initiating a proactive data quality program is usually resisted.
Ten Best Podcasts
- Data Quality and Big Data — Guest Tom Redman (aka the “Data Doc”) discusses Data Quality and Big Data, including if data quality matters less in larger data sets, and if statistical outliers represent business insights or data quality issues.
- Saving Private Data — Recorded in December 2011, guest Daragh O Brien discusses the data privacy and data protection implications of social media, cloud computing, and big data.
- Decision Management Systems — Guest James Taylor discusses data-driven decision making and analytical concepts from his book: Decision Management Systems: A Practical Guide to Using Business Rules and Predictive Analytics.
- Demystifying Master Data Management — Guest John Owens explains the three types of data (Transaction, Domain, Master), the four master data entities (Party, Product, Location, Asset), and the Party-Role Relationship, which is where we find many of the terms commonly used to describe the Party master data entity (e.g., Customer, Supplier, Employee).
- Social Media for Midsize Businesses — Sponsored by IBM Midsize Business Solutions, guest Paul Gillin, author of four books, the latest, co-authored with Greg Gianforte, is Attack of the Customers, discusses social media marketing concepts.
- Data Driven — Guest Tom Redman (aka the “Data Doc”) discusses concepts from one of my favorite data quality books, which is his most recent book: Data Driven: Profiting from Your Most Important Business Asset.
- The Johari Window of Data Quality — Guest Martin Doyle discusses helping people better understand their data and assess its business impacts, not just the negative impacts of bad data quality, but also the positive impacts of good data quality.
- The Evolution of Enterprise Security — Sponsored by the Enterprise CIO Forum, guest Bill Laberis discusses striking a balance between convenience and security, which is necessary in the era of cloud computing and mobile devices.
- Defining Big Data — This episode of the Open MIKE Podcast, with assistance from Robert Hillard, discusses how big data refers to big complexity, not big volume, even though complex datasets tend to grow rapidly, thus making them voluminous.
- Getting to Know NoSQL — This episode of the Open MIKE Podcast discusses how NoSQL does not mean AntiSQL (i.e., NoSQL is not a Relational replacement), and that business-driven big data needs will often require “Not Only SQL.”
Ten Best of the Rest
- DQ-View: Data Is as Data Does — In this short video, I explain that data’s value comes from data’s usefulness, exemplifying the potential value of unstructured data based on whether or not you put what you read in data management books to use.
- DQ-View: The Five Stages of Data Quality — In this short video, using my superb acting skills, I demonstrate how coming to terms with the daunting challenge of data quality is somewhat similar to experiencing the Five Stages of Grief.
- DQ-View: MetaData makes BettahMusic — In this short video, I demonstrate how better metadata makes data better using the metadata automatically and manually created after importing my CD collection into my iTunes library.
- Metadata, Data Quality, and the Stroop Test — In this colorful (and perhaps too colorful) post, I use the Stroop Test, where colors do not match their names, to discuss the relationship between metadata and data quality.
- Quality is the Higgs Field of Data — Using one of the biggest science stories of 2012, the potential discovery of the elusive Higgs Boson (which I also attempt to explain), I attempt an analogy for data quality based on the Higgs Field.
- The Family Circus and Data Quality — Thanks to The Family Circus comic strip created by cartoonist Bil Keane, I explain how Ida Know owns the data, Not Me is accountable for data governance, and Nobody takes responsibility for data quality.
- Data Love Song Mashup — Since your data needs love too, on Valentine’s Day I wrote this post providing a mashup of love songs for your data (and Rob DuMoulin added a few more in the comments) — Happy Data Quality to you and your data!
- The Algebra of Collaboration — The trick of algebra equates collaboration with data quality and data governance success when collaboration is viewed not just as a guiding principle, but also as a call to action in your daily practices.
- The Return of the Dumb Terminal — With help from author Kevin Kelly and my old green machine, I ponder how the mobile-app-portal-to-the-cloud computing model means mobile devices are bringing about the return of the dumb terminal.
- An Enterprise Carol — Jacob Marley raises the ghosts of a few ideas to consider about how to keep the Enterprise well in the new year via the Ghosts of Enterprise Past (Legacy Applications), Present (IT Consumerization), and Future (Big Data).
Thank You for Reading OCDQ Blog in 2012
In 2012, the Obsessive-Compulsive Data Quality (OCDQ) blog published 92 posts, which received 160,000 total page views, while averaging over 400 page views and 200 unique visitors a day.
Thank you for reading OCDQ Blog in 2012. Your readership was deeply appreciated.