LinkedIn open-sources a tool to run TensorFlow on Hadoop Ryan McGuire

LinkedIn open-sources a tool to run TensorFlow on Hadoop

LinkedIn has open-sourced a project for scaling and managing deep learning jobs in TensorFlow, using the YARN (Yet Another Resource Negotiator) job scheduling system in Hadoop.

The Tony project came about after LinkedIn tried to use two existing open source solutions for running scheduled TensorFlow jobs on Hadoop and found them both wanting. A few projects to run TensorFlow on Hadoop already exist, but LinkedIn was unsatisfied with them. One, TensorFlow on Spark, runs TensorFlow via Apache Spark’s job engine, but it couples too tightly with Spark. Another, TensorFlowOnYARN, provided the same basic functionality as Tony, but is unmaintained and didn’t provide fault tolerance.

Deep learning models in TensorFlow need some form of job management. Training models can take hours or days, and the training process needs some guarantee it can complete correctly.

Tony uses YARN’s resource and task scheduling system to set up TensorFlow jobs across a Hadoop cluster, according to LinkedIn’s press notes. Tony can also schedule GPU-based TensorFlow jobs through Hadoop, request different kinds of resources (GPUs vs. CPUs), or allocate memory differently for TensorFlow nodes and ensure that job outputs are saved periodically to HDFS and resumed from where they left off if they crash or are interrupted.

Tony splits its work among three internal components: a client, an application master, and a task executor. The client accepts incoming TensorFlow jobs; the application master negotiates with YARN’s resource manager to provision the job on YARN; and the task executor is what’s actually launched on the YARN cluster to run the TensorFlow job.

LinkedIn claims there is no discernible overhead for TensorFlow jobs when using Tony, because Tony “is in the layer [that] orchestrates distributed TensorFlow and does not interfere with the actual execution of the TensorFlow job.”

Tony also works with the TensorBoard application for visualizing, optimizing, and debugging TensorFlow apps.

IDG Insider

PREVIOUS ARTICLE

« The details you may have missed in Apple's 'Gather round' iPhone event

NEXT ARTICLE

OnePlus finally works up the 'courage' to dump the headphone jack in the 6T »
author_image
IDG Connect

IDG Connect tackles the tech stories that matter to you

  • Mail

Recommended for You

Tech Cynic: VR, the never-popular technology

Tech Cynic – IT without the rose-tinted spectacles

Five months on, GDPR doubts remain for this lawyer

Martin Veitch's inside track on today’s tech trends

How can smart solutions help address Southeast Asia's urban challenges?

Keri Allan looks at the latest trends and technologies

Poll

Is your organization fully GDPR compliant?