Heterogeneous model parallelism for deep neural networks
暂无分享,去创建一个
[1] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[2] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[3] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[4] Javier Plaza,et al. Training deep neural networks: a static load balancing approach , 2020, The Journal of Supercomputing.
[5] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[6] Luc Van Gool,et al. Learning Filter Basis for Convolutional Neural Network Compression , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[7] Hassan Foroosh,et al. Factorized Convolutional Neural Networks , 2016, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).
[8] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning , 2018, ACM Comput. Surv..
[9] Meng Zhang,et al. Recent Advances in Convolutional Neural Network Acceleration , 2018, Neurocomputing.
[10] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.
[11] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[12] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.
[13] Tao Wang,et al. Image Classification at Supercomputer Scale , 2018, ArXiv.
[14] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[15] Xiaoguang Liu,et al. EC-DNN: A new method for parallel training of deep neural networks , 2018, Neurocomputing.
[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[17] Andre Esteva,et al. A guide to deep learning in healthcare , 2019, Nature Medicine.
[18] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[19] Alexey L. Lastovetsky,et al. Dynamic Load Balancing of Parallel Computational Iterative Routines on Platforms with Memory Heterogeneity , 2010, Euro-Par Workshops.
[20] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[21] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[23] Jiawei Jiang,et al. Heterogeneity-aware Distributed Parameter Servers , 2017, SIGMOD Conference.
[24] Aman Jantan,et al. State-of-the-art in artificial neural network applications: A survey , 2018, Heliyon.
[25] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[26] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[27] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.
[28] Niall O' Mahony,et al. Deep Learning vs. Traditional Computer Vision , 2019, CVC.
[29] Jianping Yin,et al. Distributed and asynchronous Stochastic Gradient Descent with variance reduction , 2017, Neurocomputing.
[30] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[31] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[32] Guang Shi,et al. A distributed parallel training method of deep belief networks , 2020, Soft Computing.
[33] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[34] Ignacio Rojas,et al. Neural networks: An overview of early research, current frameworks and new challenges , 2016, Neurocomputing.
[35] Olatunji Ruwase,et al. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[36] Yong Man Ro,et al. Convolution with Logarithmic Filter Groups for Efficient Shallow CNN , 2017, MMM.
[37] Steven C. H. Hoi,et al. Deep Learning for Image Super-Resolution: A Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[38] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[39] Mercedes Eugenia Paoletti,et al. Deep learning classifiers for hyperspectral imaging: A review , 2019 .
[40] Ziming Zhong,et al. FuPerMod: A Framework for Optimal Data Partitioning for Parallel Scientific Applications on Dedicated Heterogeneous HPC Platforms , 2013, PaCT.
[41] Mohammad Shoeybi,et al. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.
[42] Nikhil R. Devanur,et al. PipeDream: Fast and Efficient Pipeline Parallel DNN Training , 2018, ArXiv.