暂无分享,去创建一个
Torsten Hoefler | Elad Hoffer | Daniel Soudry | Tal Ben-Nun | Itay Hubara | Niv Giladi | Daniel Soudry | Itay Hubara | Elad Hoffer | T. Hoefler | Tal Ben-Nun | Niv Giladi
[1] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[2] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[3] Yoshua Bengio,et al. Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.
[4] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.
[5] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[6] Yang You,et al. Large Batch Training of Convolutional Networks , 2017, 1708.03888.
[7] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[8] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.
[10] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[11] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[12] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[13] Satoshi Matsuoka,et al. Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs , 2018, ArXiv.
[14] Hiroaki Mikami,et al. Massively Distributed SGD: ImageNet/ResNet-50 Training in a Flash , 2018 .
[15] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.
[16] Gustavo Carneiro,et al. A Bayesian Data Augmentation Approach for Learning Deep Models , 2017, NIPS.
[17] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[18] Rif A. Saurous,et al. Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks , 2017, ICLR.
[19] Satoshi Matsuoka,et al. Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Elad Hoffer,et al. Exponentially vanishing sub-optimal local minima in multilayer neural networks , 2017, ICLR.
[21] Moustapha Cissé,et al. Efficient softmax approximation for GPUs , 2016, ICML.
[22] Tao Wang,et al. Image Classification at Supercomputer Scale , 2018, ArXiv.
[23] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[24] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[25] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[26] Amos J. Storkey,et al. Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.
[27] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[28] Yang You,et al. Scaling SGD Batch Size to 32K for ImageNet Training , 2017, ArXiv.
[29] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[30] Yann LeCun,et al. Regularization of Neural Networks using DropConnect , 2013, ICML.
[31] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[32] Ioannis Mitliagkas,et al. Deep Learning at 15PF : Supervised and Semi-Supervised Classification for Scientific Data , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.
[33] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning , 2018, ACM Comput. Surv..
[34] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.
[36] Leon Sixt,et al. RenderGAN: Generating Realistic Labeled Data , 2016, Front. Robot. AI.
[37] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[38] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[39] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[40] Lei Wu,et al. How SGD Selects the Global Minima in Over-parameterized Learning: A Dynamical Stability Perspective , 2018, NeurIPS.
[41] Carlo Luschi,et al. Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.
[42] Quoc V. Le,et al. AutoAugment: Learning Augmentation Policies from Data , 2018, ArXiv.
[43] S. Shankar Sastry,et al. Step Size Matters in Deep Learning , 2018, NeurIPS.
[44] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[46] Joseph Gonzalez,et al. On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent , 2018, ArXiv.
[47] Pongsakorn U.-Chupala,et al. ImageNet/ResNet-50 Training in 224 Seconds , 2018, ArXiv.
[48] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.