Big Data Island Hopping with Hadoop

Tamara Dull, the Director of Emerging Technologies for SAS Best Practices, recently published a series of blog posts providing a journey through the Big Data Archipelago, which examined the ways that big data has impacted traditional data best practices. Each post visited a fictional island (collectively forming an archipelago, hence the series name) representing a key big data topic. While the target audience was marketers who have relied on the analysis of traditional data sources for years to gain valuable customer insight, the entire series is recommended reading for all. In this post, I want to focus on one of its technology aspects.

As Dull explained, the new battle cry for data is “take advantage of the processing power of big data technologies.”

The yellow elephant trumpeting a big portion of the new data battle cry is Hadoop. Originating eight years ago as an ongoing open source project designed to address the storage and processing requirements for data of all shapes and sizes, Hadoop’s primary components are a general purpose file system (HDFS) providing a data storage framework, a programming model (MapReduce) providing a data processing framework, and a resource management platform (YARN) that enables the integration of other data processing options, which vendors, including SAS, have leveraged to integrate Hadoop into their big data solutions.

Although Hadoop has proven to be a cost-effective, fast, and massively scalable big data technology, as Dull was keen to point out, “the terms big data and Hadoop are not synonymous. Hadoop is just one of many big data solutions.” Furthermore, Dull pointed out that “contrary to popular belief, you don’t need big data to take advantage of the power of Hadoop. For example, you can use Hadoop to offload some of the processing work you’re currently asking your traditional data warehouse to do.”

But arguably Hadoop’s greatest strength is its ability to help organizations with an activity that allows them to get the most business value from big data—gathering and analyzing data from a variety of different sources in order to find the best data available to help solve a particular business problem. With a punning nod to Dull’s series, let’s call this big data island hopping.

Some islands will have data structured in a familiar format, such as a relational database containing customer names, postal addresses, and sales transactions. Other islands will have more exotic varieties of data, such as the un-or-semi-structured formats of social networking status updates, blog posts, online reviews, sensor readings, and mobile app activity.

Hadoop’s ability to process data in any and all formats, unstructured, semi-structured, or structured, helps organizations big data island hop quickly and cost-effectively. Enterprises can take advantage of the processing power of this big data technology to collect an archipelago of analytic insights within an environment that excels at the fast pattern recognition and discovery work required when comparing information across voluminous and variously structured data sets.

This post is brought to you by SAS.