Master Data Management

Glyn Bowden (Global) - Big Data and the Consumption Model

I’ve been hearing about “Big Data” for a while now. It’s been the term of favor recently in much the same way “cloud” was previously, and like cloud has never really been clearly defined. This is of course intentional as the term was coined back in 2010 to be used primarily as a marketing term to describe large data sets of terabytes or petabytes. The vagueness of the definition meant that it could be applied to multiple groups depending on the preferences and goals of the agency deploying the term. In other words it kept the market open and made everyone feel part of the Big Data community and groundswell. This particularly benefited those people selling infrastructure to solve the Big Data challenges. However, this openness brought some unexpected positives. Big Data turned out to be far more prevalent than initially though.

For most people involved with Big Data as part of their day job, the term refers to a single type and for them, probably the only way to define Big Data. In actual fact, I tend to think about big data falling into one of three categories. Within my organisation we think of these as the ABC’s of Big Data. So what are they? Well to me they are defined by the consumption model.

Analytics: this is where very large data pools are processed, reduced and reprocessed. The in-vogue method currently is MapReduce and the open source Hadoop platform.

Bandwidth: this is primarily concerned with taking huge amounts of data and moving that from one location to another. Some examples of this could be weather sensors and satellites gathering global data from millions of sources and needing to centralise that data in a processing location, or data from crash test scenario sensors pooling realtime data to the analysis farm.

Content: that is the requirement to store large amounts of data, and pull from that data to deliver business value. This is one of the most common forms of big data and the one that most mainstream organisations, which do not specialise in either of the other forms will recognise. Unlike analysis, there is no need to pull from parallel sources to access large portions of the data at any one time. The data sets required for consumption could be quite small, but come from a huge pool of data. When compared to Bandwidth based big data we see no real need to have a huge capability to ingest this data at speed either. A good example of this is medical records and scan images.

Read more about the ABCs of Big Data next week.

By Glyn Bowden, enterprise infrastructure architect, NetApp


« Leila Charfi (Tunisia) - Rebuilding Tunisia Through Local Innovation


Mark Warburton (US) - Connecting the Frontier: How to Reach Remote Communities »


Do you think your smartphone is making you a workaholic?