Efficient Approximations of the Fisher Matrix in Neural Networks using Kronecker Product Singular Value Decomposition
暂无分享,去创建一个
Mounir Haddou | Abdoulaye Koroko | Ani Anciaux-Sedastrian | Ibtihel Gharbia | Val'erie Gares | Quang Huy Tran
[1] Y. Saad,et al. Numerical Methods for Large Eigenvalue Problems , 2011 .
[2] Jimmy Ba,et al. Kronecker-factored Curvature Approximations for Recurrent Neural Networks , 2018, ICLR.
[3] H. Robbins. A Stochastic Approximation Method , 1951 .
[4] Pascal Vincent,et al. Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis , 2018, NeurIPS.
[5] Jonas Köhler,et al. Two-Level K-FAC Preconditioning for Deep Learning , 2020, ArXiv.
[6] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[7] D. Goldfarb. A family of variable-metric methods derived by variational means , 1970 .
[8] Gene H. Golub,et al. Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.
[9] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[10] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .
[11] Frederik Kunstner,et al. Limitations of the empirical Fisher approximation for natural gradient descent , 2019, NeurIPS.
[12] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[13] D. Sorensen. Numerical methods for large eigenvalue problems , 2002, Acta Numerica.
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] R. Fletcher,et al. A New Approach to Variable Metric Algorithms , 1970, Comput. J..
[16] Tom Heskes,et al. On Natural Learning and Pruning in Multilayered Perceptrons , 2000, Neural Computation.
[17] Roger B. Grosse,et al. A Kronecker-factored approximate Fisher matrix for convolution layers , 2016, ICML.
[18] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[19] C. G. Broyden. The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .
[20] C. Loan. The ubiquitous Kronecker product , 2000 .
[21] Yann Le Cun,et al. A Theoretical Framework for Back-Propagation , 1988 .
[22] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[23] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[24] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[25] Sanjeev Khudanpur,et al. Parallel training of DNNs with Natural Gradient and Parameter Averaging , 2014 .
[26] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[27] Donald Goldfarb,et al. Practical Quasi-Newton Methods for Training Deep Neural Networks , 2020, NeurIPS.
[28] D. Shanno. Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .
[29] David Barber,et al. Practical Gauss-Newton Optimisation for Deep Learning , 2017, ICML.
[30] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[31] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[32] Yann Ollivier,et al. Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.
[33] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..
[34] Roger B. Grosse,et al. Distributed Second-Order Optimization using Kronecker-Factored Approximations , 2016, ICLR.
[35] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..