LinkedIn open-sources a tool to run TensorFlow on Hadoop

LinkedIn open-sources a tool to run TensorFlow on Hadoop

LinkedIn has open-sourced a project for scaling and managing deep learning jobs in TensorFlow, using the YARN (Yet Another Resource Negotiator) job scheduling system in Hadoop.

The Tony project came about after LinkedIn tried to use two existing open source solutions for running scheduled TensorFlow jobs on Hadoop and found them both wanting. A few projects to run TensorFlow on Hadoop already exist, but LinkedIn was unsatisfied with them. One, TensorFlow on Spark, runs TensorFlow via Apache Spark’s job engine, but it couples too tightly with Spark. Another, TensorFlowOnYARN, provided the same basic functionality as Tony, but is unmaintained and didn’t provide fault tolerance.

Deep learning models in TensorFlow need some form of job management. Training models can take hours or days, and the training process needs some guarantee it can complete correctly.

Tony uses YARN’s resource and task scheduling system to set up TensorFlow jobs across a Hadoop cluster, according to LinkedIn’s press notes. Tony can also schedule GPU-based TensorFlow jobs through Hadoop, request different kinds of resources (GPUs vs. CPUs), or allocate memory differently for TensorFlow nodes and ensure that job outputs are saved periodically to HDFS and resumed from where they left off if they crash or are interrupted.

Tony splits its work among three internal components: a client, an application master, and a task executor. The client accepts incoming TensorFlow jobs; the application master negotiates with YARN’s resource manager to provision the job on YARN; and the task executor is what’s actually launched on the YARN cluster to run the TensorFlow job.

LinkedIn claims there is no discernible overhead for TensorFlow jobs when using Tony, because Tony “is in the layer [that] orchestrates distributed TensorFlow and does not interfere with the actual execution of the TensorFlow job.”

Tony also works with the TensorBoard application for visualizing, optimizing, and debugging TensorFlow apps.

IDG Insider


« The details you may have missed in Apple's 'Gather round' iPhone event


OnePlus finally works up the 'courage' to dump the headphone jack in the 6T »
IDG Connect

IDG Connect tackles the tech stories that matter to you

  • Mail

Recommended for You

How to (really) evaluate a developer's skillset

Adrian Bridgwater’s deconstruction & analysis of enterprise software

Unicorns are running free in the UK but Brexit poses a tough challenge

Trevor Clawson on the outlook for UK Tech startups

Cloudistics aims to trump Nutanix with 'superconvergence' play

Martin Veitch's inside track on today’s tech trends


Is your organization fully GDPR compliant?