暂无分享,去创建一个
Guodong Zhang | Roger B. Grosse | Bowen Xu | Chaoqi Wang | Guodong Zhang | Chaoqi Wang | Bowen Xu | R. Grosse
[1] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[2] S. Bos,et al. Using weight decay to optimize the generalization ability of a perceptron , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).
[3] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[4] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[5] Tom Heskes,et al. On Natural Learning and Pruning in Multilayered Perceptrons , 2000, Neural Computation.
[6] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[7] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[8] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[9] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[10] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[11] Quoc V. Le,et al. Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.
[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[13] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[14] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[15] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[16] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Xinyun Chen. Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .
[18] Roger B. Grosse,et al. Distributed Second-Order Optimization using Kronecker-Factored Approximations , 2016, ICLR.
[19] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.
[20] Twan van Laarhoven,et al. L2 Regularization versus Batch and Weight Normalization , 2017, ArXiv.
[21] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[22] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[23] Roger B. Grosse,et al. A Coordinate-Free Construction of Scalable Natural Gradient , 2018, ArXiv.
[24] Elad Hoffer,et al. Norm matters: efficient and accurate normalization schemes in deep networks , 2018, NeurIPS.
[25] Guodong Zhang,et al. Noisy Natural Gradient as Variational Inference , 2017, ICML.
[26] Jascha Sohl-Dickstein,et al. Sensitivity and Generalization in Neural Networks: an Empirical Study , 2018, ICLR.
[27] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[28] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[29] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..