论文信息 - Understanding and optimizing packed neural network training for hyper-parameter tuning

Understanding and optimizing packed neural network training for hyper-parameter tuning

As neural networks are increasingly employed in machine learning practice, organizations will have to determine how to share limited training resources among a diverse set of model training tasks. This paper studies jointly training multiple neural network models on a single GPU. We presents an empirical study of this operation, called pack, and end-to-end experiments that suggest significant improvements for hyperparameter search systems. Our research prototype is in TensorFlow, and we evaluate performance across different models (ResNet, MobileNet, DenseNet, and MLP) and training scenarios. The results suggest: (1) packing two models can bring up to 40% performance improvement over unpacked setups for a single training step and the improvement increases when packing more models; (2) the benefit of a pack primitive largely depends on a number of factors including memory capacity, chip architecture, neural network structure, and batch size; (3) there exists a trade-off between packing and unpacking when training multiple neural network models on limited resources; (4) a pack-based Hyperband is up to 2.7x faster than the original Hyperband training method in our experiment setting, with this improvement growing as memory size increases and subsequently the density of models packed.

[1] Wencong Xiao,et al. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads , 2019, USENIX Annual Technical Conference.

[2] David M. Brooks,et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[3] Ameet Talwalkar,et al. Exploiting Reuse in Pipeline-Aware Hyperparameter Tuning , 2019, ArXiv.

[4] Amar Phanishayee,et al. Accelerating Deep Learning Workloads Through Efficient Multi-Model Execution , 2018 .

[5] Bingsheng He,et al. Efficient Memory Management for GPU-based Deep Learning Systems , 2019, ArXiv.

[6] T. S. Eugene Ng,et al. Green, Yellow, Yield: End-Host Traffic Scheduling for Distributed Deep Learning with TensorLights , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[7] Hai Jin,et al. Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures , 2018, ACM Trans. Archit. Code Optim..

[8] Tian Li,et al. Ease.ml: Towards Multi-tenant Resource Sharing for Machine Learning Workloads , 2017, Proc. VLDB Endow..

[9] Mor Harchol-Balter,et al. TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters , 2016, EuroSys.

[10] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12] Hidehito Yabuuchi,et al. Low-latency job scheduling with preemption for the development of deep learning , 2019, OpML.

[13] Li Chen,et al. Cynthia: Cost-Efficient Cloud Resource Provisioning for Predictable Distributed Deep Neural Network Training , 2019, ICPP.

[14] Ameet Talwalkar,et al. Massively Parallel Hyperparameter Tuning , 2018, ArXiv.

[15] Carl A. Waldspurger,et al. Memory resource management in VMware ESX server , 2002, OSDI '02.

[16] Taro Sekiyama,et al. Profile-guided memory optimization for deep neural networks , 2018, ArXiv.

[17] Rui Liu,et al. Artificial Intelligence in Resource-Constrained and Shared Environments , 2019, ACM SIGOPS Oper. Syst. Rev..

[18] Ameet Talwalkar,et al. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..

[19] Zenglin Xu,et al. Superneurons: dynamic GPU memory management for training deep neural networks , 2018, PPoPP.

[20] Robert M. Patton,et al. FLEET: Flexible Efficient Ensemble Training for Heterogeneous Deep Neural Networks , 2020, MLSys.

[21] Kiyokuni Kawachiya,et al. TFLMS: Large Model Support in TensorFlow by Graph Rewriting , 2018, ArXiv.

[22] Abdul Wasay,et al. Rapid Training of Very Large Ensembles of Diverse Neural Networks , 2018, ArXiv.