by Flickr.com
This is the first of a three part series examining the impact of big data on business.
Today, your own data is not enough. It's not enough for financial services to live with its own transactional data; it needs social data to help understand what products customers like and dislike and to help quantify risk (is the person I am insuring a pilot on weekends?). Similarly, the energy world needs data from NOAA and climate science to better understand energy exploration. Even inside your own corporate walls, one division's data often needs another's to yield the necessary insights and answers.
Data on Its Own has Limits
Your own data is a valuable asset, but by itself it often has limits. Isolated, proprietary data paints a narrow picture. Answers are incomplete because the data is incomplete.
Toyota learned this when it connected six years worth of data that was never intended to be connected, to troubleshoot quality problems significantly faster than if the connected data had remained separated. Progressive Insurance realized it needed to go beyond its existing data so it began offering an optional sensor for cars it was insuring to base insurance rates on actual driver behavior, not statistical assumptions. So too NOAA is putting sensors along the ocean floor to get better ocean data. This notion of instrumenting the environment, be it cars or oceans, illustrates the demand for more accurate data to make better decisions.
Data Wants to be Shared
Further, this data wants to be shared. CERN, with its Large Hadron Collider project, built a data platform right from the start to share information among scientists globally. Sharing has become a first principle, not an afterthought, that gives purpose to the data and ties it all together.
With the advent of mashups, people began seeing the power of combining disparate data sets to glean new insights. Linked data offers another take on mashups, providing a standard format for linking massive data sets on the Web. (Semantics is at the core.) Important public databases in government, life sciences and media are finding their way into the linked data movement. Participants include Freebase, The New York Times, the U.S. government, the U.K. government, Linked Sensor Data (Kno.e.sis) and some 200 others. Very quickly your data becomes part of a broader community of connections by simply adhering to the linked data construct.
The U.S. government's initiative, Data.gov, has 400,000 data sets for areas including health, law, energy, science and education. The initiative is building communities, such as Health.Data.gov, Law.Data.gov, and Energy.Data.gov, where people can discuss the data, point to other related data sources, share applications and visualizations, issue challenges for new applications and ignite innovation. For example, Health.Data.gov helped spawn a new company, Asthmapolis.com, which aids asthma sufferers. By giving people GPS-equipped inhalers, Asthmapolis.com records the time and location of inhaler use, aggregates this data, and provides a new source of data for physicians, scientists and public health officials to use to improve asthma management and identify asthma-related triggers in the environment. Patients can use the results to avoid an area, and city governments can use the results to change zoning laws, enforce regulations, or put pressure on a facility to reduce its asthma-triggering emissions (e.g., dust, pollutants). The idea is to understand how certain places can affect health and then take action to promote better health.
Another powerful linked data example is the Linked Clinical Trials project, which aims to publish the first open Semantic Web data source for medical clinical trials data. LinkedCT focuses on discovering links between clinical trials data and several other data sources, including DBpedia and YAGO (linked data sources, based on Wikipedia articles, about diseases and drugs), DailyMed (information about marketed drugs published by the U.S. National Library of Medicine) and Diseasome (information about disorders and disease genes). The project links clinical trials and outcomes to yield new research opportunities.
Stay tuned for our next post, which will discuss how wider data sets can help solve some of our most difficult problems, and how technology is answering the call for data analytics.
September 13th: Big Data: Your Own Data is Not Enough Part 2
September 17th: Big Data: Your Own Data is Not Enough Part 3
By Paul Gustafson, Director of CSC's Leading Edge Forum, Technology Programs. This article is based on a new LEF report, Data rEvolution - www.csc.com/lefreports
Search blog
What you're Saying
Dear Sir, It is refreshing to notice your optimism in forecasting SA future economic growth but looking form the perspective of technical education...
Raluca Pauna 05-07-2012
The reappearance of long-forgotten habitats and the resurgence of species unseen for years may not be among the expected effects of a natural disaster....
Firozali A.Mulla DBA05-04-2012
IDG Connect Soundbite
Global: Supply chain lessons the healthcare industry needs to learn http://t.co/IwkFRRGv
News: Flipboard integrates audio capabilities http://t.co/ZWySTJLz
Global: What will drive the next wave of mobile innovation? http://t.co/fJ5ft09n
South Africa: How smart companies retain top talent http://t.co/lqXdGiuD
Looking outside your data
As I recently learned from a colleague, environmental data has applications in healthcare, especially data that represents local air and water quality in proximity to where a patient lives and works. For instance, air quality data can be considered when developing treatment plans for patients with asthma. This is not currently done systematically in the US, but there are physicians who would like to do so. Environmental data could be useful in epidemiological studies to identify causal factors in disease clusters, and perhaps eventually could be used in predictive models to aid in prevention. In order for this movement to really take off, we need to find a way to break down the barriers of public data sharing. Just like the Internet, bits want to run free. I am hopeful that the Linked Data movement will inspire organizations to not only consume others public bits, but to find ways to publish their bits as well. Let's use this blog to share some of the exciting possibilities when "all data is connected" as well as to tackle the significant challenges involved with organizational data bits running free. We explore some of this in our new report, "Data rEvolution," which you can find at www.csc.com/lefreports
Posted by: Paul Gustafson
12 Sep 2011 | 11:27