Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis
暂无分享,去创建一个
Pascal Vincent | César Laurent | Nicolas Ballas | Thomas George | Xavier Bouthillier | Pascal Vincent | Nicolas Ballas | Xavier Bouthillier | César Laurent | Thomas George
[1] Razvan Pascanu,et al. Natural Neural Networks , 2015, NIPS.
[2] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[3] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[4] Roger B. Grosse,et al. A Kronecker-factored approximate Fisher matrix for convolution layers , 2016, ICML.
[5] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[6] Nicol N. Schraudolph. Fast Curvature Matrix-Vector Products , 2001, ICANN.
[7] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[8] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[10] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[11] Toru Ohira,et al. A Neural Network model with Bidirectional Whitening , 2018, ICAISC.
[12] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[13] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[14] Pascal Vincent,et al. An Evaluation of Fisher Approximations Beyond Kronecker Factorization , 2018, ICLR.
[15] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[16] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[17] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[18] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[19] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .
[20] Yann Ollivier,et al. Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.
[21] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[22] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Roger B. Grosse,et al. Distributed Second-Order Optimization using Kronecker-Factored Approximations , 2016, ICLR.
[24] Tom Heskes,et al. On Natural Learning and Pruning in Multilayered Perceptrons , 2000, Neural Computation.
[25] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.