暂无分享,去创建一个
Nicolas Le Roux | Yoshua Bengio | Fabian Pedregosa | Bart van Merrienboer | Valentin Thomas | Pierre-Antoine Manzagol | Yoshua Bengio | Pierre-Antoine Manzagol | Fabian Pedregosa | Valentin Thomas | B. V. Merrienboer
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Behnam Neyshabur,et al. Implicit Regularization in Deep Learning , 2017, ArXiv.
[3] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[4] Pascal Vincent,et al. Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis , 2018, NeurIPS.
[5] Hilbert J. Kappen,et al. On-line learning processes in artificial neural networks , 1993 .
[6] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[7] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[8] M. Stone. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .
[9] David R. Anderson,et al. Multimodel Inference , 2004 .
[10] Lei Wu,et al. The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent , 2018, ArXiv.
[11] Mark W. Schmidt. Convergence rate of stochastic gradient with constant step size , 2014 .
[12] Laurent Boué. Real numbers, data science and chaos: How to fit any dataset with a single parameter , 2019, ArXiv.
[13] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[14] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[15] Jascha Sohl-Dickstein,et al. PCA of high dimensional random walks with comparison to neural network training , 2018, NeurIPS.
[16] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[17] Vahab S. Mirrokni,et al. Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions , 2018, ICML.
[18] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[19] G. Schwarz. Estimating the Dimension of a Model , 1978 .
[20] Jascha Sohl-Dickstein,et al. Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.
[21] Yann LeCun,et al. Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.
[22] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[24] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[25] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[26] Trac D. Tran,et al. A Scale Invariant Flatness Measure for Deep Network Minima , 2019, ArXiv.
[27] F. Bach,et al. Non-parametric Stochastic Approximation with Large Step sizes , 2014, 1408.0361.
[28] Sho Yaida,et al. Fluctuation-dissipation relations for stochastic gradient descent , 2018, ICLR.
[29] H. Akaike. A new look at the statistical model identification , 1974 .
[30] Vahid Tarokh,et al. On Optimal Generalizability in Parametric Learning , 2017, NIPS.
[31] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[32] James Martens,et al. New perspectives on the natural gradient method , 2014, ArXiv.
[33] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[34] Frederik Kunstner,et al. Limitations of the Empirical Fisher Approximation , 2019, NeurIPS.
[35] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[36] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[37] Shun-ichi Amari,et al. Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.
[38] Francis R. Bach,et al. From Averaging to Acceleration, There is Only a Step-size , 2015, COLT.
[39] Ashutosh Kumar Singh,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .
[40] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[41] Nicolas Le Roux,et al. Improving First and Second-Order Methods by Modeling Uncertainty , 2010 .