论文信息 - Spotlight: Optimizing Device Placement for Training Deep Neural Networks - 字舞流文

Spotlight: Optimizing Device Placement for Training Deep Neural Networks

Training deep neural networks (DNNs) requires an increasing amount of computation resources, and it becomes typical to use a mixture of GPU and CPU devices. Due to the heterogeneity of these devices, a recent challenge is how each operation in a neural network can be optimally placed on these devices, so that the training process can take the shortest amount of time possible. The current state-of-the-art solution uses reinforcement learning based on the policy gradient method, and it suffers from suboptimal training times. In this paper, we propose Spotlight, a new reinforcement learning algorithm based on proximal policy optimization, designed specifically for finding an optimal device placement for training DNNs. The design of our new algorithm relies upon a new model of the device placement problem: by modeling it as a Markov decision process with multiple stages, we are able to prove that Spotlight achieves a theoretical guarantee on performance improvements. We have implemented Spotlight in the CIFAR-10 benchmark and deployed it on the Google Cloud platform. Extensive experiments have demonstrated that the training time with placements recommended by Spotlight is 60.9% of that recommended by the policy gradient method.

Baochun Li | Li Chen | Yuanxiang Gao | Baochun Li | Li Chen | Yuanxiang Gao

[1] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.

[2] D. Hunter,et al. A Tutorial on MM Algorithms , 2004 .

[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Hongzi Mao,et al. Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.

[5] Srikanth Kandula,et al. Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[6] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[7] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[8] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[9] Quoc V. Le,et al. A Hierarchical Model for Device Placement , 2018, ICLR.

[10] Theodore Shifrin,et al. Multivariable Mathematics: Linear Algebra, Multivariable Calculus, and Manifolds , 2004 .

[11] John Schulman,et al. Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs , 2016 .

[12] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[13] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[14] James C. Spall,et al. Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[15] Quoc V. Le,et al. Listen, Attend and Spell , 2015, ArXiv.

[16] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[17] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[18] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[19] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[21] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.