Distributed TensorFlow with MPI

Machine Learning and Data Mining (MLDM) algorithms are becoming increasingly important in analyzing large volume of data generated by simulations, experiments and mobile devices. With increasing data volume, distributed memory systems (such as tightly connected supercomputers or cloud computing systems) are becoming important in designing in-memory and massively parallel MLDM algorithms. Yet, the majority of open source MLDM software is limited to sequential execution with a few supporting multi-core/many-core execution. In this paper, we extend recently proposed Google TensorFlow for execution on large scale clusters using Message Passing Interface (MPI). Our approach requires minimal changes to the TensorFlow runtime -- making the proposed implementation generic and readily usable to increasingly large users of TensorFlow. We evaluate our implementation using an InfiniBand cluster and several well knowndatasets. Our evaluation indicates the efficiency of our proposed implementation.

[1]  William Gropp,et al.  MPI-2: Extending the Message-Passing Interface , 1996, Euro-Par, Vol. I.

[2]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[3]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[4]  Anselm Vossen Support Vector Machines in High Energy Physics , 2008 .

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Kesheng Wu,et al.  Scientific Discovery at the Exascale , 2011 .

[9]  Dr.S.Selvarajan S.Deepajothi A Comparative Study of Classification Techniques On Adult Data Set , 2012 .

[10]  Alok Choudhary,et al.  Synergistic Challenges in Data-Intensive Science and Exascale Computing: DOE ASCAC Data Subcommittee Report , 2013 .

[11]  Pierre Baldi,et al.  Searching for Higgs Boson Decay Modes with Deep Learning , 2014, NIPS.

[12]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[13]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[14]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).