On the interplay between noise and curvature and its effect on optimization and generalization
暂无分享,去创建一个
Nicolas Le Roux | Yoshua Bengio | Fabian Pedregosa | Bart van Merrienboer | Valentin Thomas | Pierre-Antoine Mangazol | Yoshua Bengio | Fabian Pedregosa | Valentin Thomas | Pierre-Antoine Mangazol | B. V. Merrienboer
[1] Vahab S. Mirrokni,et al. Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions , 2018, ICML.
[2] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[3] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[4] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[5] Shun-ichi Amari,et al. Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.
[6] Francis R. Bach,et al. From Averaging to Acceleration, There is Only a Step-size , 2015, COLT.
[7] Vahid Tarokh,et al. On Optimal Generalizability in Parametric Learning , 2017, NIPS.
[8] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[9] Trac D. Tran,et al. A Scale Invariant Flatness Measure for Deep Network Minima , 2019, ArXiv.
[10] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[11] Nicolas Le Roux,et al. Improving First and Second-Order Methods by Modeling Uncertainty , 2010 .
[12] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[13] F. Bach,et al. Non-parametric Stochastic Approximation with Large Step sizes , 2014, 1408.0361.
[14] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[15] Sho Yaida,et al. Fluctuation-dissipation relations for stochastic gradient descent , 2018, ICLR.
[16] Frederik Kunstner,et al. Limitations of the Empirical Fisher Approximation , 2019, NeurIPS.
[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[18] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[19] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[20] Mark W. Schmidt. Convergence rate of stochastic gradient with constant step size , 2014 .
[21] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[22] Jascha Sohl-Dickstein,et al. Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.
[23] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[24] Pascal Vincent,et al. Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis , 2018, NeurIPS.
[25] Zhanxing Zhu,et al. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects , 2018, ICML.
[26] H. Akaike. A new look at the statistical model identification , 1974 .
[27] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[28] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[29] Yann LeCun,et al. Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.
[30] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.