Analyzing the distributed training of deep-learning models via data locality

In the last few years, deep-learning models are becoming crucial for numerous scientific and industrial applications. Due to the growth and complexity of deep neural networks, researchers have been investigating techniques to train those networks more efficiently. Many efforts have been made to optimize deep-learning models by parallelizing or distributing their training computation across multiple devices. Current state-of-the-art techniques, such as Horovod, have shown to maximize the performance of both the training computation and the inter-node communication of models for different deep-learning frameworks. However, some applications cannot take advantage of the above techniques due to an I/O bottleneck caused by the input data, thus limiting the scalability of the trainings. In this paper, we study an approach based on data locality - that has not been fully studied yet - for those neural networks that cannot benefit from scaling their computation due to a significant bottleneck in the data I/O.

[1]  Richard Bowden,et al.  A Survey of Deep Learning Applications to Autonomous Vehicle Control , 2019, IEEE Transactions on Intelligent Transportation Systems.

[2]  Pavan Balaji,et al.  Scalable Deep Learning via I/O Analysis and Optimization , 2019, TOPC.

[3]  Henri Casanova,et al.  Versatile, scalable, and accurate simulation of distributed applications and platforms , 2014, J. Parallel Distributed Comput..

[4]  Wu-chun Feng,et al.  Towards Scalable Deep Learning via I/O Analysis and Optimization , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[5]  Kang G. Shin,et al.  Tiresias: A GPU Cluster Manager for Distributed Deep Learning , 2019, NSDI.

[6]  Thomas Breuel,et al.  High Performance I/O For Large Scale Deep Learning , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[7]  Khaled F. Hussain,et al.  Accurate, Data-Efficient, Unconstrained Text Recognition with Convolutional Neural Networks , 2018, Pattern Recognit..

[8]  Julian Togelius,et al.  Deep Learning for Video Game Playing , 2017, IEEE Transactions on Games.

[9]  Pingkun Yan,et al.  Deep learning in medical image registration: a survey , 2020, Machine Vision and Applications.

[10]  Trishul M. Chilimbi,et al.  Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.

[11]  Matti Pietikäinen,et al.  Deep Learning for Generic Object Detection: A Survey , 2018, International Journal of Computer Vision.

[12]  Forrest N. Iandola,et al.  How to scale distributed deep learning? , 2016, ArXiv.

[13]  Ruben Mayer,et al.  Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools , 2019 .

[14]  Antonio J. Plaza,et al.  Image Segmentation Using Deep Learning: A Survey , 2021, IEEE transactions on pattern analysis and machine intelligence.

[15]  Ian Foster,et al.  Aggregating Local Storage for Scalable Deep Learning I/O , 2019, 2019 IEEE/ACM Third Workshop on Deep Learning on Supercomputers (DLS).

[16]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[17]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Samy Bengio,et al.  Revisiting Distributed Synchronous SGD , 2016, ArXiv.

[19]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[20]  Kazuhiro Terao,et al.  Machine learning at the energy and intensity frontiers of particle physics , 2018, Nature.

[21]  Jugal K. Kalita,et al.  A Survey of the Usages of Deep Learning for Natural Language Processing , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Fabio Checconi,et al.  Alleviating Load Imbalance in Data Processing for Large-Scale Deep Learning , 2020, 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID).

[23]  Stephen Grossberg,et al.  Nonlinear neural networks: Principles, mechanisms, and architectures , 1988, Neural Networks.

[24]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[25]  Thorsten Kurth,et al.  TensorFlow at Scale: Performance and productivity analysis of distributed training with Horovod, MLSL, and Cray PE ML , 2018, Concurr. Comput. Pract. Exp..