暂无分享,去创建一个
Pradeep Dubey | Karthikeyan Vaidyanathan | Dheevatsa Mudigere | Dipankar Das | Sasikanth Avancha | Mikhail Shiryaev | Naveen Mellempudi | Bharat Kaul | Dhiraj D. Kalamkar | Srinivas Sridharan | Mikhail E. Smorkalov
[1] Pradeep Dubey,et al. Distributed Deep Learning Using Synchronous Stochastic Gradient Descent , 2016, ArXiv.
[2] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[3] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[4] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[5] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[6] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[7] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[8] Vikram A. Saletore,et al. Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train , 2017, ArXiv.
[9] Anthony Skjellum,et al. Planning for performance: persistent collective operations for MPI , 2017, EuroMPI/USA.
[10] Tim Dettmers,et al. 8-Bit Approximations for Parallelism in Deep Learning , 2015, ICLR.
[11] Ioannis Mitliagkas,et al. Deep Learning at 15PF : Supervised and Semi-Supervised Classification for Scientific Data , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.