暂无分享,去创建一个
Taiji Suzuki | Mitsuhiro Kimura | Takeshi Toda | Ryuji Sakai | Masahiro Ozawa | Kosuke Haruki | Yohei Hamakawa | Taiji Suzuki | Yohei Hamakawa | Ryuji Sakai | Kosuke Haruki | M. Ozawa | Takeshi Toda | Mitsuhiro Kimura
[1] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[2] Vikram A. Saletore,et al. Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train , 2017, ArXiv.
[3] Takuya Akiba,et al. Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes , 2017, ArXiv.
[4] Yi Yang,et al. Random Erasing Data Augmentation , 2017, AAAI.
[5] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[6] Satoshi Matsuoka,et al. Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs , 2018, ArXiv.
[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Satoshi Matsuoka,et al. Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[10] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[11] Lei Wu,et al. The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent , 2018, ArXiv.
[12] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[13] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[14] Yuanzhou Yang,et al. Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.
[15] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.
[16] Cong Xu,et al. SmoothOut: Smoothing Out Sharp Minima for Generalization in Large-Batch Deep Learning , 2018, ArXiv.
[17] Yoshua Bengio,et al. On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length , 2018, ICLR.
[18] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[19] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[20] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.
[21] Takuya Akiba,et al. ChainerMN: Scalable Distributed Deep Learning Framework , 2017, ArXiv.
[22] Kenta Oono,et al. Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .
[23] Forrest N. Iandola,et al. FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Thomas Hofmann,et al. Escaping Saddles with Stochastic Gradients , 2018, ICML.
[25] James Demmel,et al. ImageNet Training in Minutes , 2017, ICPP.
[26] Yang You,et al. Large Batch Training of Convolutional Networks , 2017, 1708.03888.
[27] Zhanxing Zhu,et al. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects , 2018, ICML.