Business Management

At 10 years old, Hadoop continues to evolve at speed

In a tower block in central London, Mike Olson, chief strategy officer of Cloudera, cuts a cake decorated with a cartoon yellow elephant to general applause. To outsiders this might seem some sort of weird cult performance and even for those well versed in IT the whole thing takes some explaining. The cake is there to celebrate the fact that 10 years ago, Yahoo stood up its first implementation of Hadoop, the open source software framework that is now synonymous with the ability to manage massive data sets on commodity servers. The elephant is called Hadoop (pronounced Ha-DOOP) and is named after Hadoop creator Doug Cutting’s son’s much adored stuffed toy. As for Cloudera, the Californian company is arguably the largest in the segment known to some as Big Data, and it is betting big that Hadoop is around to stay.

Olson, an American who is suited but wears an earring and cowboy boots, is as close to a veteran as you will see in this youthful but fast maturing sector. In 2006 he sold the open source database company Sleepycat Software to Oracle and he co-founded Cloudera in 2008 as a startup dedicated to providing a business-class version of Hadoop in much the same way, you might say, that Red Hat provides an enterprise with the support and sanding of rough edges for Linux. (For the record, Olson prefers to compare his company to the likes of Splunk, VMware and Sun Microsystems as firms that have sparked the passion of devotees.)

For Olson, the short version of the story of the 10-year adventure is that Hadoop has moved from software to ecosystem. “We call it Hadoop but it looks very different today,” he says at the customer and media event in London. Evidence of that is the fact that a few metres to his left are Tom White and Marcel Kornacker, experts respectively in Whirr for running cloud-neutral services and Impala for SQL querying. These are just two development projects that are giving Hadoop greater breadth and heft.

But Hadoop has advanced in other ways too. Look for example at the financial picture. Cloudera landed a $740m windfall from Intel in 2014 for just 18 per cent of the company. And even though the company joined the ranks of those having their pre-IPO valuations marked down recently, most watchers see Cloudera as being worth billions of dollars. That’s partly because when it revealed last year that it had $100m in annual revenue and over 100 percent growth year on year, that trajectory stood comparison with the fastest growing companies in enterprise software history. Then there is Cloudera’s perennial competitor Hortonworks which at time of writing has a valuation north of $650m, despite a rocky road since its IPO last year and currently trading at less than half its 52-week high.


Hadoop and away  

Why is Hadoop generating so much heat? It’s mostly about being able to throw incredible resources at answering complex questions.

Harry Powell, head of advanced data analytics at Barclays, uses Hadoop to support the bank’s strategy “to establish our position as a bank that’s adding value to our 10 million active customers”.

“There are ways we can use information to give all our customers the same level of support and engagement as if they were hugely rich people,” he says.

By automating the probing of large data sets, he can help “mark us out as a bank that’s about customers rather than profit and loss”, he adds. One example: an app for small and medium-sized businesses that could let owners of a hairdressing shop in Croydon figure out how much they are spending on electricity compared to local rivals, based on anonymised data from multiple sources.

Powell talks about processes being speeded up by 1,000 times.

“That’s the amazing thing about distributed computation,” he says. “You can do computation you couldn’t have dreamed of before. We think you can do pretty much everything with Spark [cluster computing framework] and the Scala [programming language] API.”

For Mark Pybus, head of Big Data engineering at online bookmaker Skybet, Hadoop is about analysing data not just to offer bonuses and offers to get people betting but also to identify potential problem gamblers. By identifying patterns Skybet might be able to lower spending limits, for example.

“We can generally tell an irresponsible gambler before they can tell us,” he says. “Before, it was the case that you didn’t know someone was a problem gambler until they told you.”

Like Olson, Pybus says he “finds it hard to refer to Hadoop anymore… it’s an ecosystem of tools” that has evolved greatly in recent years.

Phil Bradley, BT chief data architect, says the telecoms giant has spent five years on Hadoop and has been “full on for two”, generating “dozens” of use cases. One example: using 25 years of phone line testing records and other data such as age of line, or distance from cabinet, to predict the broadband experience of a customer.

“It’s about trying to make BT a more data-driven company,” he says, so that when big decisions like spending £6bn on superfast broadband or buying EE to re-enter cellular communications, those choices are made with as much evidential support as possible.

Yet another example comes from Nick Turner, a consultant at Markerstudy, who is using Hadoop to improve motor insurance products with enriched quotes that absorb multidimensional data in real time. “Before, we relied on a data sample that represented three to five per cent of the overall data; now all of it with every rating factor [is available].”


Entering maturity

What’s still missing? BT’s Bradley nominates data governance and access controls as areas that still need improvement. “The open source software community does great things but governance is probably not one of them,” he says.

For Barclays’ Powell there remains a “degree of tension between governance and agility” and he earmarks Spark streaming and interactivity as areas meriting development.

But finding people with data science skills to get the most out of the data gusher might be a tougher task. Barclays’ Powell says that the people needed today probably aren’t in the mould of traditional business analysts.

“Data scientists tend to want to work in cool places and our attraction is the data set we have, but the fact that we’re a big old business and not as agile is something we need to change to attract [the best candidates],” he says.

BT’s Bradley concurs that developing skills is “one of the limiting factors” for Hadoop today but he is optimistic nonetheless, arguing that self-service business intelligence is advancing. That in turn means historic challenges such as understanding what’s behind customer churn rates and getting rid of nuisance callers can be solved.

Cloudera’s Olson adds the belief that platforms and competition might be changing as more Hadoop queries move into the cloud - from about 15 per cent of Cloudera workloads today - and large IT companies make their plays.

But there’s general consensus that, whatever near-term challenges exist, the alternatives are far worse.

“With relational you’d get stuck and be stuck for three or four years,” says Skybet’s Pybus.

And even if the pace of change is high then at least there is a compensating improvement in quality and a smoother delivery process. BT’s Bradley says:

“We had the equivalent of a plane in flight with passengers and someone said we’re going to change the engines… but it wasn’t that painful.”


Also read:

Cloudera CEO builds for new world of data-driven insights

Cloudera soars from a mountain of cash

Hortonworks president is hungry for more


« The Iceberg Matrix: Brainstorming climate change 2040


With LinkedIn, Microsoft risks gaining a reputation as a nag »
Martin Veitch

Martin Veitch is Contributing Editor for IDG Connect

  • twt
  • twt
  • Mail

Recommended for You

Future-proofing the Middle East

Keri Allan looks at the latest trends and technologies

FinancialForce profits from PSA investment

Martin Veitch's inside track on today’s tech trends

Amazon Cloud looms over China: Bezos enters Alibaba home ground

Lewis Page gets down to business across global tech


Do you think your smartphone is making you a workaholic?