LinkedIn open-sources a tool to run TensorFlow on Hadoop Credit: Ryan McGuire

LinkedIn open-sources a tool to run TensorFlow on Hadoop

LinkedIn has open-sourced a project for scaling and managing deep learning jobs in TensorFlow, using the YARN (Yet Another Resource Negotiator) job scheduling system in Hadoop.

The Tony project came about after LinkedIn tried to use two existing open source solutions for running scheduled TensorFlow jobs on Hadoop and found them both wanting. A few projects to run TensorFlow on Hadoop already exist, but LinkedIn was unsatisfied with them. One, TensorFlow on Spark, runs TensorFlow via Apache Spark’s job engine, but it couples too tightly with Spark. Another, TensorFlowOnYARN, provided the same basic functionality as Tony, but is unmaintained and didn’t provide fault tolerance.

Deep learning models in TensorFlow need some form of job management. Training models can take hours or days, and the training process needs some guarantee it can complete correctly.

Tony uses YARN’s resource and task scheduling system to set up TensorFlow jobs across a Hadoop cluster, according to LinkedIn’s press notes. Tony can also schedule GPU-based TensorFlow jobs through Hadoop, request different kinds of resources (GPUs vs. CPUs), or allocate memory differently for TensorFlow nodes and ensure that job outputs are saved periodically to HDFS and resumed from where they left off if they crash or are interrupted.

Tony splits its work among three internal components: a client, an application master, and a task executor. The client accepts incoming TensorFlow jobs; the application master negotiates with YARN’s resource manager to provision the job on YARN; and the task executor is what’s actually launched on the YARN cluster to run the TensorFlow job.

LinkedIn claims there is no discernible overhead for TensorFlow jobs when using Tony, because Tony “is in the layer [that] orchestrates distributed TensorFlow and does not interfere with the actual execution of the TensorFlow job.”

Tony also works with the TensorBoard application for visualizing, optimizing, and debugging TensorFlow apps.

IDG Insider

PREVIOUS ARTICLE

«SAP CIO interview: SAP's digital transformation in SE Asia

NEXT ARTICLE

OnePlus finally works up the 'courage' to dump the headphone jack in the 6T»
author_image
IDG Connect

IDG Connect tackles the tech stories that matter to you

Recommended for You

alex-cruickshank

Platform or publisher?

Tech Cynic – IT without the rose-tinted spectacles

martin-veitch-thumbnail

Mark Shuttleworth’s next mission: making private clouds affordable

Martin Veitch's inside track on today’s tech trends

dan2

GDPR-based extortion could be the next cybercrime trend

Dan Swinhoe casts a critical eye on the future

Our Case Studies

IDG Connect delivers full creative solutions to meet all your demand generatlon needs. These cover the full scope of options, from customized content and lead delivery through to fully integrated campaigns.

images

Our Marketing Research

Our in-house analyst and editorial team create a range of insights for the global marketing community. These look at IT buying preferences, the latest soclal media trends and other zeitgeist topics.

images

Poll

Should the government regulate Artificial Intelligence?