暂无分享,去创建一个
Andrew Gordon Wilson | Dmitry P. Vetrov | Pavel Izmailov | Timur Garipov | Dmitrii Podoprikhin | A. G. Wilson | D. Vetrov | Pavel Izmailov | Dmitrii Podoprikhin | T. Garipov | A. Wilson
[1] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[2] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[3] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[4] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[5] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[6] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[7] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[8] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[9] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[10] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Restarts , 2016, ArXiv.
[11] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Xavier Gastaldi,et al. Shake-Shake regularization , 2017, ArXiv.
[14] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[16] Junmo Kim,et al. Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Nicholay Topin,et al. Exploring loss function topology with cyclical learning rates , 2017, ArXiv.
[18] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[19] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[20] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[21] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[22] Fred A. Hamprecht,et al. Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.
[23] Andrew Gordon Wilson,et al. Improving Consistency-Based Semi-Supervised Learning with Weight Averaging , 2018, ArXiv.
[24] Andrew Gordon Wilson,et al. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.
[25] Dmitry Vetrov,et al. Variance Networks: When Expectation Does Not Meet Your Expectations , 2018, ICLR.