Data Mining

A quest for big data discovery

The volume of digital information is expected to nearly double every two years between today and 2020, reaching 40 trillion gigabytes, in just seven years according to analyst firm IDC.

To stay ahead of this storm of data, business leaders, entrepreneurs and even start-ups are scrambling for ways to take advantage of the insights everyone agrees are hidden in this Big Data.

The big data question: How?

The term Big Data today is a commonplace in tech publication headlines. However, even CIOs thinking about the impact of Big Data, business leaders outside IT management have been slow to engage. The business world, in general, is still focused simply on how to deal with an overwhelming volume of information. Researchers at the Gartner Group expect that by 2015, some 85% of Fortune 500 companies will still not be effectively exploiting Big Data's value for competitive advantage in their markets.

While some companies ponder how to turn a growing volume of data into an asset, others sit on corporate data without a plan to tap its hidden value. Some CIOs may think they already have done what they can to create business insights; they gather, extract and query the data. Still many CIOs are struggling to unlock the insights and business opportunities buried in their corporate data.

What are these CIOs missing? In my opinion, they have overlooked a critical part of the success equation -- how to make the most of the data resource their company owns. Is the promised land of business insights beyond our reach? Absolutely not.

Oftentimes organizations simply don't go far enough in giving users tools needed to make discoveries happen. This is unfortunate. Today, however, this is easily remedied with access to user-driven business intelligence (BI) systems which enables employees across an enterprise to leverage data for insights to make the business more agile, answer questions previously considered beyond reach, and ultimately leverage corporate data to make faster, better informed business decisions.

Hive into Hadoop

Data collection and integration are the first step to making data discovery possible. However, hadoop queries are usually slow as traditional BI systems were never designed for user-driven analytics, and certainly not for consumers.

Today's computer users, however, have skills and demands far advanced from their predecessors even five or six years ago. BI users today are not IT managers or data scientists with advanced analytic skills. Business user today are keen to "mash up" and find insights in new and emerging types of big data from all sorts of data sources.

Data gathered can be structured or semi-structured data such as CRM systems and spreadsheets as well as real time unstructured data from sources like Facebook updates, tweets, forum posts, just to name a few.

To put it into perspective, I can illustrate this idea by pointing to the case of King.com, best known as the founder of Candy Crush Saga. King.com is a global leader in casual social games and offers over 150 exclusive games in 14 languages to over 40 million monthly players and more than 3 billion games played per month worldwide.

King.com uses a Hadoop-based big data solution to store massive amounts of gaming activity and customer data. Hive is leveraged as a data warehouse system for Hadoop to run ad-hoc queries, and the analysis of large datasets. Each user's 'event' is first logged locally on the game servers and then the information is copied hourly to a centralized log server and subsequently logged in a Hadoop before the magic of business analytics even begins.

The social gaming giant wanted to extract data from a variety of sources and customize metadata to be applied as external tables from different sources and be used with big data extracted from the Hadoop system. Built-in associative search data models and capability extract and merge data from different data sources helps King.com's BI team become a metadata driven business intelligence unit.

Making data discovery possible

King.com(http://www.king.com)'s gaming system generates 2 billion rows of data per day and this volume continues to grow. Analyzing data without disturbing the game load was a key performance requirement for the company. Another requirement was to have a simplistic analytics and reporting system that allows game development teams to be geographically separate from the platform development. Finally, as its business grows, the requirements for analytics were more sophisticated than ever. Having the data available for complex queries and analytics with fast performance was a necessity.

King.com's IT department was challenged to empower business users with self-service analytics and give them a gaming experience that will keep them coming back for more. Business users from different job functions wanted to explore the data relevant to their job, and able to slice and dice the data by many permutations of the hundreds of dimensions available in the big data.

By giving business users more control, they have more freedom to navigate and analyze their data, not a pre-determined data set that traditional BI delivers. Gartner researchers call this new category of BI 'Data Discovery' -- we call it 'Business Discovery'.

The solution also provided 'speed of thought' analysis of King.com's big data. Since all the data needed for analysis is in memory, business users can explore a relevant piece of big data immediately. Zero wait time as user-driven BI performs the calculations needed to deliver the analysis users request. More importantly, they can literally see relationships in the big data with the unique associative search capability and leverage all of the dimensions of the big data with any combinations during analysis. With a user-driven BI environment, King.com can now analyze 40 million customers' gaming behavior to target new games and customers.

User-driven BI platforms enable all users across an organization to make discoveries in the data, on their own or in teams and groups. It's not just for business analysts or data scientists; it truly is for everyone.

In the new world of business discovery, no one should be treated as a passive "end user" of canned reports and pre-built dashboards. With intuitive, exploratory technology, there's no reason why organizations cannot empower savvy employees, even without special skills, to spot anomalies and hidden relationships never before seen.

Data discovery should not merely be seen as technical analysis that only a data scientist can do. Instead, the technology should have intuitive and exploratory capabilities that can illuminate stories contained in the data. The power of data discovery lies in its ability to "liberate" data via visualization as much as build a compelling story narrative for business users.


« Samsung may introduce LTE Advanced version of Galaxy S4 in June


Five ways the Sprint-Clearwire drama might end »
IDG News Service

The IDG News Service is the world's leading daily source of global IT news, commentary and editorial resources. The News Service distributes content to IDG's more than 300 IT publications in more than 60 countries.

  • Mail


Do you think your smartphone is making you a workaholic?