datacen
Master Data Management

Paul Gustafson (US) - Big Data: Your Own Data is Not Enough Part I

This is the first of a three part series examining the impact of big data on business.

Today, your own data is not enough. It's not enough for financial services to live with its own transactional data; it needs social data to help understand what products customers like and dislike and to help quantify risk (is the person I am insuring a pilot on weekends?). Similarly, the energy world needs data from NOAA and climate science to better understand energy exploration. Even inside your own corporate walls, one division's data often needs another's to yield the necessary insights and answers.

Data on Its Own has Limits
Your own data is a valuable asset, but by itself it often has limits. Isolated, proprietary data paints a narrow picture. Answers are incomplete because the data is incomplete.

Toyota learned this when it connected six years worth of data that was never intended to be connected, to troubleshoot quality problems significantly faster than if the connected data had remained separated. Progressive Insurance realized it needed to go beyond its existing data so it began offering an optional sensor for cars it was insuring to base insurance rates on actual driver behavior, not statistical assumptions. So too NOAA is putting sensors along the ocean floor to get better ocean data. This notion of instrumenting the environment, be it cars or oceans, illustrates the demand for more accurate data to make better decisions.

Data Wants to be Shared
Further, this data wants to be shared. CERN, with its Large Hadron Collider project, built a data platform right from the start to share information among scientists globally. Sharing has become a first principle, not an afterthought, that gives purpose to the data and ties it all together.

With the advent of mashups, people began seeing the power of combining disparate data sets to glean new insights. Linked data offers another take on mashups, providing a standard format for linking massive data sets on the Web. (Semantics is at the core.) Important public databases in government, life sciences and media are finding their way into the linked data movement. Participants include Freebase, The New York Times, the U.S. government, the U.K. government, Linked Sensor Data (Kno.e.sis) and some 200 others. Very quickly your data becomes part of a broader community of connections by simply adhering to the linked data construct.

The U.S. government's initiative, Data.gov, has 400,000 data sets for areas including health, law, energy, science and education. The initiative is building communities, such as Health.Data.gov, Law.Data.gov, and Energy.Data.gov, where people can discuss the data, point to other related data sources, share applications and visualizations, issue challenges for new applications and ignite innovation. For example, Health.Data.gov helped spawn a new company, Asthmapolis.com, which aids asthma sufferers. By giving people GPS-equipped inhalers, Asthmapolis.com records the time and location of inhaler use, aggregates this data, and provides a new source of data for physicians, scientists and public health officials to use to improve asthma management and identify asthma-related triggers in the environment. Patients can use the results to avoid an area, and city governments can use the results to change zoning laws, enforce regulations, or put pressure on a facility to reduce its asthma-triggering emissions (e.g., dust, pollutants). The idea is to understand how certain places can affect health and then take action to promote better health.

Another powerful linked data example is the Linked Clinical Trials project, which aims to publish the first open Semantic Web data source for medical clinical trials data. LinkedCT focuses on discovering links between clinical trials data and several other data sources, including DBpedia and YAGO (linked data sources, based on Wikipedia articles, about diseases and drugs), DailyMed (information about marketed drugs published by the U.S. National Library of Medicine) and Diseasome (information about disorders and disease genes). The project links clinical trials and outcomes to yield new research opportunities.

Stay tuned for our next post, which will discuss how wider data sets can help solve some of our most difficult problems, and how technology is answering the call for data analytics.

September 13th: Big Data: Your Own Data is Not Enough Part 2

September 17th: Big Data: Your Own Data is Not Enough Part 3

By Paul Gustafson, Director of CSC's Leading Edge Forum, Technology Programs. This article is based on a new LEF report, Data rEvolution - www.csc.com/lefreports

 

PREVIOUS ARTICLE

« Barry Regan (UK) - Indispensable: Change management in the Cloud

NEXT ARTICLE

Mark Warburton (UK) - Social Media and Civil Unrest Part 2 »

Recommended for You

Trump hits partial pause on Huawei ban, but 5G concerns persist

Phil Muncaster reports on China and beyond

FinancialForce profits from PSA investment

Martin Veitch's inside track on today’s tech trends

Future-proofing the Middle East

Keri Allan looks at the latest trends and technologies

Poll

Do you think your smartphone is making you a workaholic?