Modern enterprises are deriving new value from insights resulting from the deluge of new data generated every day. A common use case in Big Data is modeling individual consumers with data including transactions, web traffic, social data, and more. And a new generation of technologies makes consumer-specific insights a profitable margin for action.
Although out-of-the-box software exists to run at the click of a button over large amounts of data, technology alone is not enough. Out-of-the-box solutions with defined analytical techniques constrain the approaches available, the types of data utilized, and the nature of the insights derived. And in Big Data, few provide solutions to addressing the growing problem of messy data. Because of this, firms have increasingly sought to leverage the value of their data by hiring data scientists.
Early in its definition, data science sat at the epicenter of new products and services. The Big Data storage technologies were new and the business cases driven from the data were unfamiliar. Many firms managed this unfamiliarity by seeking data scientists who could lead the efforts of extracting value from data by acting as a blend of four roles:
Business Analyst: Understands the organization's challenges, and opportunities, and the factors - internal and external - that could affect them
Statistician: Taking a question, translating it into a hypothesis, and testing it against the data
Technologist: Wrangling large data sets, often with non-tabular structures and high frequencies of noise and error, to produce a viable dataset
Artist: Casting aside the status quo to spots patterns that result in novel new products and strategies
Increasingly, this overloaded approach is being replaced with organizational process and team collaboration. Transitioning Data Science from simply a job role to an organizational process expands the complexity to which enterprises can adapt and increases the sophistication of the insights gleaned from data.
Modern enterprises approach deriving value from data as collaboration between the business, data science, and technology teams that includes these six steps:
Business Hypothesis: Business teams lead the identification of questions relevant to the organization's objectives
Analytic Agenda: Data science leads the effort to translate a business question to a hypothesis and supports research agenda
Data Acquisition: Data science leads the effort to identify relevant data sources, both internal and external to the organization
Acquisition: Technology leads the acquisition of the required tools
Modeling: Data science leads the exploration and understanding of signals within the data, novel insights and augmentations are then shared with business
Application Development: Technology leads the application and integration of insights to product
The data scientist leads the process, promoting analytical best practices, and ensures that immediate business questions are addressed, while at the same time watching for unexpected new insights.
As practices mature, the data scientist's role will narrow in definition.
Some data scientists will act as pure practitioners; others will have a defined business or technology focus. Amongst practitioners, there will be specialization by area of expertise, such as structured versus unstructured data.
Successful data scientists require technical proficiency, strong intuition, and an ability to communicate and collaborate with business leaders. Operationalizing a mature data science practice requires skills development across organizational units:
Modern analytic techniques and a focus on high dimensionality models
Big Data Wrangling (manipulation and extraction) tools and paradigms
Big Data ingestion, storage, and processing
New analytics tools
Modern Big Data use cases
Data Science Practice building
Thought leaders in the industry are doing a good job of sharing their knowledge and providing hands-on opportunities to learn the skills required to make the leap to data science. Whether it be a webinar, a 2-day course or a full-fledged academic program at a university, expanding one's knowledge is always a good investment.
Dan Mallinger, Data Science Team Lead, Think Big Analytics
Field Cady, Data Scientist, Think Big Analytics
There is increasing pressure on IT infrastructures to deal with the data explosion of cloud computing, and mobile