暂无分享,去创建一个
James Demmel | Yang You | Ruobing Han | J. Demmel | Yang You | Ruobing Han
[1] Jeff Johnson,et al. Rethinking floating point for deep learning , 2018, ArXiv.
[2] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[3] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[4] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[5] Sam Ade Jacobs,et al. Communication Quantization for Data-Parallel Training of Deep Neural Networks , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).
[6] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.
[7] James Demmel,et al. ImageNet Training in Minutes , 2017, ICPP.
[8] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[9] Takuya Akiba,et al. Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes , 2017, ArXiv.
[10] Yang You,et al. Scaling SGD Batch Size to 32K for ImageNet Training , 2017, ArXiv.
[11] Kunle Olukotun,et al. DAWNBench : An End-to-End Deep Learning Benchmark and Competition , 2017 .
[12] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[13] Pradeep Dubey,et al. A Study of BFLOAT16 for Deep Learning Training , 2019, ArXiv.
[14] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.
[15] Hao Wu,et al. Mixed Precision Training , 2017, ICLR.
[16] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[17] Xin Yuan,et al. Bandwidth optimal all-reduce algorithms for clusters of workstations , 2009, J. Parallel Distributed Comput..
[18] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .
[19] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[20] Shengen Yan,et al. GradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN Training , 2019, IEEE Transactions on Big Data.
[21] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[22] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[23] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Nikko Strom,et al. Scalable distributed DNN training using commodity GPU cloud computing , 2015, INTERSPEECH.
[26] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[27] Tao Wang,et al. Image Classification at Supercomputer Scale , 2018, ArXiv.
[28] Natalia Gimelshein,et al. Virtualizing Deep Neural Networks for Memory-Efficient Neural Network Design , 2016, ArXiv.
[29] Yoshua Bengio,et al. Training deep neural networks with low precision multiplications , 2014 .
[30] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[31] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[32] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[33] Daniel Brand,et al. Training Deep Neural Networks with 8-bit Floating Point Numbers , 2018, NeurIPS.
[34] Christopher De Sa,et al. QPyTorch: A Low-Precision Arithmetic Simulation Framework , 2019, 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS).
[35] Xin Wang,et al. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks , 2017, NIPS.
[36] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.