Governing Big Data

I was a featured guest on the recent #BigDataMgmt Twitter chat about evolving governance for big data, which discussed that as the world becomes more interconnected and more data is gathered, the opportunity for gaffes in handling and using that data increases dramatically.  Data governance can be thought of as the overall process for gaffe-proofing an organization’s data.

As James Kobielus blogged, “data governance is the price you pay to maintain the value of data as a precious business resource.  When big data is everywhere, the negative business impacts from poor governance are also ubiquitous.”  It seems like almost every week now we see media coverage providing poignant examples of the negative business impact of poor data governance.

However, in no way should these much-publicized failures slow down the adoption of big data, David Corrigan blogged.  “In fact, the ability to handle sensitive big data carefully can become a significant differentiator for an organization.”  As Tim Crawford blogged, “data is quickly becoming the new currency and leading businesses are looking for ways to capitalize on this change.  The potential information generated from the data presents major opportunities across industries from providing greater work efficiency to saving lives.  Business, as a whole, is becoming even more reliant on information and therefore data-driven.  Data ultimately provides greater insight, personalization, and accuracy for business decisions.”

According to Crawford, one of the complications of big data is that a lot of it “comes from new sources such as wearable devices, mobile devices, social media, and machine data.  In many cases, unlike traditional enterprise data, which is structured in nature, these new sources of data reside in many forms and are typically unstructured.  This presents a challenge to traditional data warehouses that are accustomed to consuming and managing structured data.”

Bridging the divide between unstructured and structured data is one of the biggest challenges involved with managing and governing big data.  Most important to note is that unstructured data is also the source of increased data privacy concerns, ranging from organizations providing data to government agencies to people giving away their data for free email.  The latter is an example of how we need to take some personal responsibility for self-governing our data, while the former is an example of how we want regulatory protection holding data users accountable for what they do with our data.

While organizations of all sizes are rightfully excited about the business potential of using big data, this excitement needs to be balanced by acknowledging the business risks associated with not governing the ways big data is used.

Technology can aid in governing big data by, as Richard Lee tweeted, “automating the policies and standards created by data governance bodies to ensure that data quality meets requirements.”  Technology for data discovery, profiling, and exploration increases data visibility, helping to identify data quality issues and prioritize data cleansing.  These efforts can increase an organization’s confidence that their data is well-governed.  However, “technology can’t work miracles,” James Kobielus tweeted.  “You need a strong, sustained data stewardship practice to maintain confidence” in your big data governance.