Accelerating Deep Learning Training Through Transparent Storage Tiering

We present Monarch,a framework-agnostic storage middleware that transparently employs storage tiering to accelerate Deep Learning (DL) training. It leverages existing storage tiers of modern supercomputers (i.e., compute node's local storage and shared parallel file system (PFS)), while considering the I/O patterns of DL frameworks to improve data placement across tiers. Monarchaims at accelerating DL training and decreasing the I/O pressure imposed over the PFS. We apply Monarchto TensorFlow and PyTorch, while validating its performance and applicability under different models and dataset sizes. Results show that, even when the training dataset can only be partially stored at local storage, Monarchreduces TensorFlow's and PyTorch's training time by up to 28% and 37% for I/O-intensive models, respectively. Furthermore, Monarchdecreases the number of I/O operations submitted to the PFS by up to 56%.

[1]  Nikoli Dryden,et al.  Clairvoyant Prefetching for Distributed Machine Learning I/O , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Shengen Yan,et al.  DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training , 2020, ICPP.

[3]  Dhabaleswar K. Panda,et al.  Frontera: The Evolution of Leadership Computing at the National Science Foundation , 2020, PEARC.

[4]  Amar Phanishayee,et al.  Analyzing and Mitigating Data Stalls in DNN Training , 2020, Proc. VLDB Endow..

[5]  Muthian Sivathanu,et al.  Quiver: An Informed Storage Cache for Deep Learning , 2020, FAST.

[6]  Osamu Tatebe,et al.  Accelerating Machine Learning I/O by Overlapping Data Staging and Mini-batch Generations , 2019, BDCAT.

[7]  Pavan Balaji,et al.  Scalable Deep Learning via I/O Analysis and Optimization , 2019, TOPC.

[8]  Weikuan Yu,et al.  I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning , 2019, ICPP.

[9]  Xian-He Sun,et al.  Hermes: a heterogeneous-aware multi-tiered distributed I/O buffering system , 2018, HPDC.

[10]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[11]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Frank B. Schmuck,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Gpfs: a Shared-disk File System for Large Computing Clusters , 2022 .

[14]  Andrew J. Hutton,et al.  Lustre: Building a File System for 1,000-node Clusters , 2003 .

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.