Master Data Management

Hadoop Latecomer Hortonworks No Longer Hears a 'Who?'

Most people close to the action in modern data management know that Apache Hadoop is generating lots of attention because of its ability to process and store huge volumes of data that is often unstructured: clickstream, sensor and location data and so on. What is less clear is which of the companies providing open source distributions will assume the dominance that Red Hat did in Linux. The two companies best positioned, seemingly, are Cloudera and the company, I recently interviewed, Hortonworks. Cloudera had first-mover advantage but Hortonworks is winning attention. Or, to risk a dreadful pun based on the Dr Seuss elephant character that gives the company its name, Hortonworks no longer hears a ‘who?’

The main reason for the broad business interest is that Big Data, for want of a better term for analytics applied to very large data sets, lets firms do useful things such as better profiling of customers to sharpen marketing and sell more things. Hadoop answers a pressing need to make use of the sheer amount of information and its variety of forms. Traditional relational database management systems and data warehouse approaches were created for a different challenge at an earlier time.

That said, Hortonworks president Herb Cunitz is no blowhard predicting the demise of Oracle or the old business intelligence players when we met recently for coffee and chat in London’s One Aldwych.

“There’s not going to be one new database [or data management system] to rule them all,” he says. “It’s early in the journey. Web 2.0 companies - the eBays, Yahoos, Facebooks - started this journey early because they didn’t have a legacy infrastructure.”

That made them able to develop scale-out systems based on commodity hardware that in turn allowed them to perform tasks such as storing vast amounts of information to index the web and serving ads based on customer preferences. But Cunitz sees a world where RDBMS, BI and the new wave of tools based on Haddop, Cassandra et al, will sit side by side in peaceful coexistence at large organisations.

Hortonworks came into being as a sole concern in summer 2011 when Yahoo was shedding assets not core to its web portal business, but it is “still very central to Yahoo”, according to Cunitz. Yahoo acts as a 40,000-node test-bed and remains a large customer, alongside other big deployments at JP Morgan, Home Depot and, as we shall come to, Spotify.

Hortonworks’ big message is to push its adherence to open-source and grow its community.

“We’re 100% open source and it’s not to be an open source purist but because we fundamentally believe you can drive innovation faster across the community than as a single company.”

Cunitz, a wiry, affable and soft-spoken former SpringSource and VMware executive, says Hortonworks and peers are reaping the rewards of the early days of open source in the enterprise won by Red Hat and peers. “Open source has matured to the point that most companies don’t have concerns over licensing or think it’s this weird thing.”

That’s helping to drive hyper-growth (500% revenue growth in the last year and headcount sprouting from 50 to 250 in the same period) and investors - Hortonworks recently pulled in $50m in VC for a total of $98m in backing. Apart from general expansion, there’s not a great need for cash reserves, Cunitz says, quipping: “The best time to raise money is when you don’t need it.”

Cunitz doesn’t demur when I suggest that his company is attempting to be the Red Hat of Hadoop, proving enterprise-class cladding in the form of support, training and consulting. As with Red Hat, enterprises don’t need bells and whistles but they do want security, monitoring, provisioning and other characteristics of blue-chip IT.

Cunitz’s view is that by being the most open Hadoop player out there, Hortonworks will help customers stay clear of reliving the proprietary world of lock-in to a single vendor and an inability to switch suppliers. He also believes that, as with the famous shoot-out that saw VHS beat BetaMax (despite, arguably, technical inferiority) the weapons to win the contest will be ecosystem support and content. To that end, Hortonworks is building its partner roster. Microsoft, Terradata and SAP all on board, giving Hortonworks a strong hand of data infrastructure giants while it is also working closely with systems integrators and other channel players. Making the most of complementary tools like ElasticSearch, Platfora for analytics, the Storm streaming framework, Apache Hive for SQL querying will also help.

Hortonworks’ progress won’t be harmed by a big customer win it just announced - Spotify. The music site believes it has the largest Hadoop cluster in Europe and, a bonus this; it has switched from being a Cloudera shop.

“We needed quite a lot of data to provide to record labels and partners we work with and we have to do analytics with user behaviour, how features in a product work and to provide data-driven recommendations, radio playlists and so on,” says Wouter de Bie, Spotify infrastructure team lead, when we speak by phone.

Spotify has used Hadoop for five years, pretty well since the former started out. “We were Cloudera for a long time because they were the first out there. We have a lot of Hadoop competence but we need some sort of backing. Hortonworks’ company culture fits us a bit better [than Cloudera]: it’s informal and there’s a slight price difference.”

The Hortonworks’ roadmap appealed with the commitment to make Hive 100% faster standing out, although de Bie believes that Cloudera is also a genuine open source advocate champion and he disputes the risk of lock-in.

“I think that’s a bit simplistic [but] Cloudera seems from far more corporate while Hortonworks is a bit more flexible and a bit more laid back.”

The outlook for Hortonworks is fine. With 120 customers, it has weaned itself off initial dependence on Yahoo, the growth outlined above is rapid and it foresees an IPO in “five to seven quarters”, that is, most likely, at some point in 2015.

Branching out to meet the needs of telcos, retailers, banks, healthcare institutions, government and so on will be the next challenge and even though Hadoop might for now be seen as a treasure chest for techies in the web and early-adopter world, there’s nothing to stop it being applicable across verticals.

Some interesting trends are emerging. Firms tend to adopt on-premise or off-premise but don’t change tack after early deployments as a “data gravity” law appears to prevail, says Cunitz. “Where it’s born it tends to stay.”

And, whereas in traditional enterprise software many geographic regions tended to lag years behind the US, the spread of Hadoop seems relatively rapid and universal.

Who will win the shoot-out? Cloudera had a start on Hortonworks but in just over two years, the latter is highly competitive and reckons that is on parity with its rival in terms of revenues. Cunitz contends that there will be no tolerance of Unix-style fragmentation. “The market will standardise on one version of Hadoop because testing costs are incredibly large.”

The question, sitting there like Horton the elephant on the table between us, is which will prevail.


Martin Veitch is Editorial Director at IDG Connect


« Crowdsourcing Innovation: Ruslan Zhunussov, CEO of Meomni


The CMO Files: Elissa Fink, Chief Marketing Officer, Tableau Software »
Martin Veitch

Martin Veitch is Contributing Editor for IDG Connect

  • twt
  • twt
  • Mail


Do you think your smartphone is making you a workaholic?