Hortonworks and HP Labs join forces to boost Spark

Hadoop distribution specialist Hortonworks has joined forces with HP Labs, HP Enterprise's central research organization, in an effort to dramatically improve the performance of Spark workloads.

[ Related: Hortonworks Boosts Support for Enterprise Needs in HDP 2.2 ]

At a press conference in San Francisco Tuesday, the two companies announced a collaboration that has already born some fruit:

  • Enhanced shuffle engine technologies. Faster sorting and in-memory computations, which has the potential to dramatically improve Spark performance.
  • Better memory utilization. Improved performance and usage for broader scalability, which will help enable new large-scale use cases.

"We're hoping to enable the Spark community to derive insight more rapidly from much larger data sets without having to change a single line of code," says Martin Fink, executive vice president and CTO of HP Enterprise and a member of Hortonworks' board. "We're very pleased to be able to work with Hortonworks to broaden the range of challenges that Spark can address."

Fink explains that HP Labs had been conducting research on the efficiency and scale of memory for the enterprise, as well as ways to enhance memory utilization for the enterprise.

"Part of that research activity is we rewrote the shuffle engine from Java to C++," he says. "We saw that we had rewritten a bunch of algorithms to make much more efficient use of memory and enabled ways that you could scale memory even more."

[ Related: Review: Spark Lights a Fire Under Big Data Processing ]

In fact, certain customers that leveraged HP Labs' work found that it increased the performance of certain workloads from 5X to 15X.

"I've been around for a long time," Fink says. "It's not often that you come out with 15X performance increases on certain workloads. We knew we needed this to be part of a greater whole."

Fink notes that HP Enterprise chose to open source its research with the help of Hortonworks due to Hortonworks' dedication to openness and collaboration.

"This collaboration indicates our mutual support of and commitment to the growing Spark community and its solutions," adds Scott Gnau, CTO of Hortonworks. "We will continue to focus on the integration of Spark into broad data architectures supported by Apache YARN as well as enhancements for performance and functionality and better access points for applications like Apache Zeppelin."

Zeppelin is an incubating Apache project that provides a Web-based notebook that enables interactive data analytics.

IDG Insider


« Western Digital's 8TB hard drives mix helium with mammoth storage


Hortonworks release cadence balances innovation with reliable Hadoop core »
IDG News Service

The IDG News Service is the world's leading daily source of global IT news, commentary and editorial resources. The News Service distributes content to IDG's more than 300 IT publications in more than 60 countries.

  • Mail


Do you think your smartphone is making you a workaholic?