User-transparent Distributed TensorFlow

Deep Learning (DL) algorithms have become the {\em de facto} choice for data analysis. Several DL implementations -- primarily limited to a single compute node -- such as Caffe, TensorFlow, Theano and Torch have become readily available. Distributed DL implementations capable of execution on large scale systems are becoming important to address the computational needs of large data produced by scientific simulations and experiments. Yet, the adoption of distributed DL implementations faces significant impediments: 1) most implementations require DL analysts to modify their code significantly -- which is a show-stopper, 2) several distributed DL implementations are geared towards cloud computing systems -- which is inadequate for execution on massively parallel systems such as supercomputers. This work addresses each of these problems. We provide a distributed memory DL implementation by incorporating required changes in the TensorFlow runtime itself. This dramatically reduces the entry barrier for using a distributed TensorFlow implementation. We use Message Passing Interface (MPI) -- which provides performance portability, especially since MPI specific changes are abstracted from users. Lastly -- and arguably most importantly -- we make our implementation available for broader use, under the umbrella of Machine Learning Toolkit for Extreme Scale (MaTEx) at {\texttt this http URL}. We refer to our implementation as MaTEx-TensorFlow.

[1]  Alok Choudhary,et al.  Synergistic Challenges in Data-Intensive Science and Exascale Computing: DOE ASCAC Data Subcommittee Report , 2013 .

[2]  Prabhat,et al.  Application of Deep Convolutional Neural Networks for Detecting Extreme Weather in Climate Datasets , 2016, ArXiv.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[5]  Geoffrey Zweig,et al.  An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.

[6]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[7]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Abhinav Vishnu,et al.  Adaptive neuron apoptosis for accelerating deep learning on large scale systems , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Samy Bengio,et al.  Revisiting Distributed Synchronous SGD , 2016, ArXiv.

[12]  Alex Krizhevsky,et al.  One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[13]  Jean-Daniel Fekete,et al.  Analysis and Visualization , 2020, Neural Machine Translation.

[14]  Kenta Oono,et al.  Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[15]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[16]  Kesheng Wu,et al.  Scientific Discovery at the Exascale , 2011 .

[17]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[18]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[19]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[20]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[21]  Abhinav Vishnu,et al.  Distributed TensorFlow with MPI , 2016, ArXiv.

[22]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Anselm Vossen Support Vector Machines in High Energy Physics , 2008 .

[26]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[27]  Khushbu Agarwal,et al.  Large Scale Frequent Pattern Mining Using MPI One-Sided Model , 2015, 2015 IEEE International Conference on Cluster Computing.

[28]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[29]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jeyanthi Narasimhan,et al.  Fast and Accurate Support Vector Machines on Large Scale Systems , 2015, 2015 IEEE International Conference on Cluster Computing.

[31]  Abhinav Vishnu,et al.  Fault Tolerant Frequent Pattern Mining , 2016, 2016 IEEE 23rd International Conference on High Performance Computing (HiPC).

[32]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[33]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[34]  Abhinav Vishnu,et al.  Fault Tolerant Support Vector Machines , 2016, 2016 45th International Conference on Parallel Processing (ICPP).

[35]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[36]  Chris H. Q. Ding,et al.  Accelerating Deep Learning with Shrinkage and Recall , 2016, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS).

[37]  William Gropp,et al.  MPI-2: Extending the Message-Passing Interface , 1996, Euro-Par, Vol. I.

[38]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[39]  Abhinav Vishnu,et al.  Fault Modeling of Extreme Scale Applications Using Machine Learning , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.