Human Resources

The Unvarnished Truth about Big Data

Ramnath Iyer knows the business of information like few others. As the CTO of CRISIL India, the global ratings agency, he watches over financial and management data collected from over 15,000 medium and large enterprises. "Our database has more than 120 million data points," says Iyer, who started his career at CRISIL as an analyst.

Two years ago, CRISIL decided to help its analysts create more accurate ratings, faster. It did this by enabling analysts to tap into large pools of data--located outside CRISIL's walls--in a more organized fashion. If, for example, its analysts wanted to rate the performance of an auto manufacturer in Gurgaon, CRISIL's big data systems would help by crawling social media sites and blog posts for opinions about the company. It would also take data from sources as varied as government reports on the production of steel, online announcements of a new excise duty, weather reports affecting the production of rubber (to gauge the cost of tires) and news from Bloomberg or Reuters regarding a strike at one of the company's ancillary providers. Combining all that information and more gave analysts a more accurate picture of the company they were rating and the environment it operated in.

In a way, Iyer's team has built a system that's as smart as its human counterpart and has become a member of the team. That's allowed analysts to rate thrice the number of companies they could before. "The toughest challenge was constructing a system with in-built intelligence; a system that can think like an analyst, that can find and co-relate various pieces of information and throw results at analysts," says Iyer.

CRISIL's story is just the sort that big data vendors and analysts love to push. And you're their target.

But under scrutiny, the glossy veneer of big data's brochure-ware gives way to a whole set of challenges, doubts and questions that are typically associated with technologies in their hype cycle. The question is: Will big data stand the test of enterprise adoption? Or will go up in a big cloud of smoke?

The Big Identity Crisis

When big data first started creating buzz, it was perceived as a technology that could manage large volumes of data. Since then, however, newer definitions have evolved to include two new parameters: The different disparate sources that companies are gathering this data from, and the speed with which data is collected, stored and processed.

"We define big data as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insights and decision making," says Sid Deshpande, senior research analyst at Gartner.

Also read:Big Data Puts Predictions within Reach

Gartner's definition focuses more on the overall management of data. Forrester's, on the other hand, lays more emphasis on the ability to extract actionable intelligence from large data sets. Forrester's VP and Country Manager, India, Manish Bahl defines big data as the, "techniques and technologies that make capturing value from data at an extreme scale economical." Bahl is quick to add that big data is about a set of different technologies and does not equal Hadoop and certainly not in-memory computing like some vendors suggest.

More recently, vendors like IBM have added another V to the equation: Veracity.

Most Indian CIOs we spoke to agree that big data is characterized by the 3Vs. When a company has to manage huge volumes of data, which are moving in and out of its systems at extremely high velocity, and these data sets originate from a large variety of disparate sources (social media, for instance), then a company has a big data problem and opportunity.

Deshpande says it's not necessary for an organization to be battling all the three Vs to have a big data problem. "Any company that's struggling with either one or two of the three Vs and can't solve the problem using their existing infrastructure and technology set can be classified as an organization with a big data problem," he says.

By that definition, MTS certainly has a big data challenge. MTS India, the mobile telecom service brand of SSTL, has over 10 million customers across nine circles and generates about 100 TB of data everyday. For Rajeev Batra, CIO, MTS, just dealing with the volume and velocity of the data the company's systems produce is a gargantuan task. "Data is constantly being generated from the call records of our customers, their usage pattern, location-based services, billing and Internet usage," says Batra.

Batra has already found ways to use that data downpour to create offerings that give MTS competitive advantage. Using data streams from different sources, MTS started a program it calls m-bonus that offers customers freebies like discounted call rates--in real time--with the aim of increasing customer loyalty. "If, for example, a customer has made five calls to a specific number in the span of an hour, we could possibly offer them a 20 percent off on the next call, thereby creating that customer delight," says Batra.

Batra also uses feeds from MTS' social media platforms to figure out how happy customers are with MTS' products. This information, he says, is used by marketing, sales and customer service to figure how they can improve service levels and customer satisfaction. "The time taken to set the wheels rolling to improve or tweak products and services based on customer feedback has been reduced by 40-50 percent," says Batra.

Some of Batra's peers shrug off work similar to his as examples of "high-powered analytics"--which demonstrates big data's lack of a universally-accepted definition.

CIOs on the big data trail like Batra, in the meanwhile, aren't waiting.

Big Data's Big Drivers

As easy as it is to take down big data--and the concept has its share of skeptics--there's plenty of proof that the technology is here to stay. For one, with the amount of data we produce, we've created an ideal breeding ground for it. Everyday, we create 2.5 quintillion bytes of data. Ninety percent of the data in the world today has been created in the last two years alone. The potential of all of that data is making businesses salivate.

"With the rise of social media, mobility, M2M communication, RFID, GPS, etcetera, business can now tap into data they never knew existed earlier," says Srinivas Peddada, EVP and CIO at SKS Microfinance.

Part of big data's charm is the more immediate results it offers, compared to traditional analytical tools like business intelligence. Earlier, for example, a new product's success could only be measured after a company's marketing team ran a survey. But in the time it took marketing to gather feedback and decide what colors to use in its presentation, unsatisfied customers already moved to the competition.

That won't happen at Target, the second-largest discount retailer in the US, if Natarajan 'Nat' Malupillai, director, digital analytics at Target's India operations, has his way. Malupillai is creating a system--it's under pilot--that leverages social media and other data available on public domains to improve what Target is showing customers shopping on its website. In addition, Malupillai says, "We would like to leverage the big data solution to compare item attributes and prices in the market place to provide the best offering for Target customers."

"In the past, analytics sat separate from transactions and could only do what we call passive analysis," says Malupillai. Any strategy a company wished to put into action based on a customer's behavior or browsing pattern could only be post facto. But big data's changing that. "From suggestions on what a customer could buy next or special offers, we should be able to get more interactive with customers on the go, and close the loop within one transaction cycle, says Malupillai.

For CXOs, immediate intelligence is an irresistible idea. So much so that businesses are unclenching their tightly-closed fists to fund big data projects. The average amount Indian companies spent on big data in 2012 was $ 9.5 million (about Rs 52 crore), according to a 2013 TCS study on big data, Indian enterprises aren't the only ones spending big on big data. Technology providers are investing large amounts to scale up their big data portfolio. In the last five years, IBM, for example, has invested over $16 billion (about Rs 88,000 crore) in 30 acquisitions to strengthen its family of big data products. At the other end of the size spectrum, millions of venture capitalist rupees are being pumped into big data start-ups, creating much-needed innovation.

And when the likes of IBM decide that enterprises are going to buy a product, they back their decision with sizeable marketing budgets and create demand. It's what's gotten a technology concept as complex and niche as big data on the cover of the Harvard Business Review, Forbes, and Businessworld.

Michael Chui, principal, McKinsey Global Institute and co-author of one of the first comprehensive reports on big data, says that since he co-authored his report in 2011, a lot has changed in the way big data is perceived. "A couple of years ago, there were a number of people in the tech community who were talking about big data, but in the intervening time we have seen interest and awareness increase in business and executive communities," he says.

The Bigger They Are...

One of the challenges of being a legend is living up to your reputation. That's a problem big data is already beginning to face thanks to all the hype surrounding it. According to Gartner's 2012 hype cycle for emerging technology, big data has moved into the Peak of Inflated Expectations. "There are several use cases where big data has helped solve various challenges at various organizations, but the hype being built around those cases is too high at the moment. End users are beginning to expect those benefits automatically, which might not turn out to be true in all cases," says Deshpande at Gartner.

One downside to that, as Arun Gupta, CIO, Cipla, points out are that IT leaders have begun to force fit problems to the technology. But because big data solutions aren't solving organic problems, he says, it's probable that big data won't see mainstream enterprise acceptance.

"Most companies I speak to are curious about big data, what with all the hype being built around it. But they have no clue what it means to them," says Gupta. "Various models created by vendors or analyst firms are not based on empirical data but largely on hypothetical, potential use cases. Anecdotal references, too, are primarily from large Internet companies and a few global FMCG experiments," he says.

Even if big data does get past this hurdle, it's going to have to face a number of legacy issues--challenges that have sunk many a business intelligence initiative in the past.

First among these is the cultural change an analytical project like big data will introduce. According to Chui, big data will turn an organization into one giant laboratory and will force companies--that are used to listening to HIPPOs (Highest Paid Person's Opinion)--to change the way they make decisions. "Imagine having to move from experience and instinct to running an organization like a 24x7 laboratory. There is nothing harder than having an organization that is learning to make decisions in a different way," he says.

Gupta isn't convinced that data gathered from sources like social media can be trusted to base critical business decisions on. "The probability that businesses can derive value from this data is low, although it does lend itself to insights that have been unavailable thus far," says Gupta.

Then there is funding. "Let's agree, big data doesn't come cheap," says Malupillai. "During these initial stages, when the technology is not mature and projects are in beta phase, it is important that your business sees value in what you are doing," he says.

Gupta doubts businesses will see value in big data just yet. "It's extremely difficult right now to provide any meaningful model which can calculate the ROI of big data investments for a typical commercial enterprise," he says. "There is so much hype that CIOs may get initial investments cleared, but the second round of funding will be difficult to justify."

Malupillai says that CIOs who are doubtful about the ROI of big data should start small and use big-data-as-a-service models. "To begin with, CIOs can pick a small set of data (say 10-20 percent of marketing or sales data), be it structured or unstructured, and use the service of a data analysis firm to analyze that data," says Malupillai.

Embracing big-data-as-a-service is an approach Bahl from Forrester suggests, too. "CIOs can explore cloud-based models for faster business results from their big data investments," he says.

That, however, doesn't solve another issue: All the legacy data sitting in enterprise systems that hasn't been mined for intelligence yet. "People talk about big data and unstructured data without having a clue about data sitting in their data base. That's just blasphemy," says Peddada at SKS Microfinance.

According to Symantec's 2012 State of the Information Survey, 66 percent of Indian businesses say they haven't got to a "single version of the truth". Another 63 percent say it's too difficult to find the right information at the right time. That's followed by 43 percent who are unsure how old information is.

At SKS Microfinance, Peddada is currently dealing with large volumes of structured data. The micro-finance company has a presence in about 20 Indian states, with about 7 million customers. It has about 10,000 employees, and a large field force that goes into India's rural interiors to disburse loans and collect payments. "The amount of transactional data we deal with is huge. Customers make payments on a weekly basis and there are as many as 1,000 reports that need to be generated in a month," says Peddada.

Peddada, an MBA from the Lally School of Management & Technology, New York, has a two-year plan chalked out to adopt big data at SKS Microfinance. "We have a good MIS system. But I wanted to streamline it first and go to the next level of business intelligence, SQL services, analytics services, and finally get to the highest level of BI," he says. His aim is to use a number of information sources to be able to, say, lower the micro-finance company's risk. For example, he says, he'd like to use weather reports and news feeds to predict whether crop production in a certain region is good enough to ensure that customers from there would be able to pay their loans.

Another challenge is the cost of storing and archiving of all that data. Bahl says it's likely that organizations will collect and store more data than ever before. The key challenge they will face is how to manage that data, what to retain, and how to eventually dispose it.

Also read: 4 Barriers Stand Between You and Big Data Insight

At SKS Microfinance, Peddada plans to improve the company's storage capacities, invest in data warehousing and processing power to match his big data goals, over the next six to seven months.

But, what about data that's coming from outside organizations? That's a question Iyer at CRISIL had to ask himself. CRISIL generates 6TB of structured data and about 6GB of unstructured data everyday. It also uses Web crawlers to pick information from social media sites and various forums and platforms available on the Web. That's then run through parsers, which conducts syntactic analysis and gives data some structure for use and before the data's subsequently discarded. "Since this unstructured data is already available on the Internet, my database will only store which location the data was downloaded from," says Iyer. "It helps lower my storage cost," he says.

Big Data's Building Blocks

Some of big data's challenges, like ROI and storage, can be countered if CIOs chip at it long enough. However, there are others, say skeptics, which are out of a CIO's hands.

One of these is the uneven speed at which different technologies that make up big data are maturing at. According to Gartner's big data hype cycle, published July 2012, technologies like Web analytics and social media monitors will take less than two years to enter mainstream adoption, while technologies like in-memory data grids and Mapreduce will take between two and five years to mature. Other technology subsets of big data like the semantic Web and information valuation could take as long as 10 years to make in-roads into the enterprise.

In this shifting landscape of technologies, CIOs will be hard-pressed to find a scalable, flexible platform for their big data initiatives.

In answer to that problem, Batra, who has been on the big data road since mid-2009, has tried to use open-standards platforms as much as possible. This, he says, makes it easier to swap out obsolete and outdated components and add new ones, move data over as necessary, and keep working with as little disruption as possible.

"The technology space has also evolved. Most new technologies are more of plug-and-play and are built on open standards specifications," says Batra. He believes that vendors will keep integration and flexibility issues in mind as they keep developing new products.

Bahl's suggestion that CIOs adopt an incremental, open-source approach to big data resonates with Batra's decision. "The evolving open source big data ecosystem around technologies such as Hadoop, Cassandra and Solr and platforms like Cloudera, Hortonworks, etcetera, is an increasingly attractive option for CIOs," says Bahl.

But what about big data skill sets, ask cynics? According to Gartner, by 2015, nearly 4.4 million new jobs will be created globally by the big data demand--and only a third of them will be filled. In India, the skill-set crunch has started seeping into the list of a CIO's biggest challenges, even among those who have just started planning their big data debuts.

"Skill sets was and will remain a major pain point, be it BI or analysts or data scientists," says Peddada.

In response, a number of training academies have mushroomed to fill the demand. Jigsaw Academy, a start-up based in Bangalore can train up to 4,000 students in a year, says Sarita Digumarti, founder, Jigsaw Academy. Courses include basic statistics (required to summarize, visualize and analyze data), analytic techniques (for descriptive and predictive modeling), and the use of analytic tools like SAS, Excel and others. Another Bangalore-based academy, Analytics Training Institute can train up to 1,200 students annually. Course fees range between Rs 27,000 to 36,000 (Jigsaw) per student.

But Gupta, among others, is not convinced. "Indian engineering schools produce a million engineers every year. Having a degree doesn't mean one is employable," he says.

This lack of experienced and skilled data scientists is what drove Malupillai to look within Target for people, from both business and IT, who could be cross-trained. Iyer, on the other hand, has been lucky: Out of the 3,800 employees at CRISIL, almost 3,200 are analysts. For CIOs who suspect they will face staffing challenge, Iyer suggests enrolling the services of an analysts firm.

Looking Forward

Clearly, a large majority believe that big data is the next frontier of business-IT innovation, competition, and productivity. But the fact is that most analysts and Indian CIOs believe that mainstream adoption of big data is going to take another two to five years. Chui from McKinsey Global Institute puts that number at a conservative 10 years. So why all the carpe diem? Because no one wants to get left behind.

Batra, Malupillai and Iyer, for example, have had their head in the big data game since pre-2010.

Indeed there will be challenges for those chasing the big data dream. But which new technology doesn't? The cloud had plenty of skeptics before it's gotten accepted.

"Leaders who are currently using big data solutions are not shying away from the need to create solutions, and, as the technology matures, put in the effort required to integrate various solutions and make them work. I mean, that's always been the work IT leaders have had to do," says Chui.


« OmniPresence, the Omni Group's new cloud sync service, hits all the right buttons


4 Principles for Selecting the Right Principals »
IDG News Service

The IDG News Service is the world's leading daily source of global IT news, commentary and editorial resources. The News Service distributes content to IDG's more than 300 IT publications in more than 60 countries.

  • Mail


Do you think your smartphone is making you a workaholic?