Performance Implications of Big Data in Scalable Deep Learning: On the Importance of Bandwidth and Caching

Deep learning techniques have revolutionized many areas including computer vision and speech recognition. While such networks require tremendous amounts of data, the requirement for and connection to Big Data storage systems is often undervalued and not well understood. In this paper, we explore the relationship between Big Data storage, networking, and Deep Learning workloads to understand key factors for designing Big Data/Deep Learning integrated solutions. We find that storage and networking bandwidths are the main parameters determining Deep Learning training performance. Local data caching can provide a performance boost and eliminate repeated network transfers, but it is mainly limited to smaller datasets that fit into memory. On the other hand, local disk caching is an intriguing option that is overlooked in current state-of-the-art systems. Finally, we distill our work into guidelines for designing Big Data/Deep Learning solutions. (Abstract)

[1]  Ying Zhang,et al.  Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks , 2016, INTERSPEECH.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Steven R. Young,et al.  An analysis of image storage systems for scalable training of deep neural networks , 2016 .

[5]  H. T. Kung,et al.  Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[6]  James Demmel,et al.  ImageNet Training in Minutes , 2017, ICPP.

[7]  Pengtao Xie,et al.  Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines , 2015, ArXiv.

[8]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Synho Do,et al.  How much data is needed to train a medical image deep learning system to achieve necessary high accuracy , 2015, 1511.06348.

[10]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[11]  Ajay Dholakia,et al.  Towards Evaluation of Tensorflow Performance in a Distributed Compute Environment , 2018, TPCTC.

[12]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Alex Krizhevsky,et al.  One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[14]  Janis Keuper,et al.  Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).

[15]  Kevin Kiningham,et al.  Design and Analysis of a Hardware CNN Accelerator , 2017 .

[16]  Miriam Bellver,et al.  Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster , 2017, ICCS.

[17]  Vishakh Hegde,et al.  Parallel and Distributed Deep Learning , 2016 .

[18]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[19]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[20]  David M. Brooks,et al.  Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[21]  Wu-chun Feng,et al.  Towards Scalable Deep Learning via I/O Analysis and Optimization , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[22]  Wu-chun Feng,et al.  Parallel I/O Optimizations for Scalable Deep Learning , 2017, 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS).

[23]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[24]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.