Optimizing Neural Networks with Kronecker-factored Approximate Curvature
暂无分享,去创建一个
[1] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[2] R. A. Smith. Matrix Equation $XA + BX = C$ , 1968 .
[3] Jorge J. Moré,et al. The Levenberg-Marquardt algo-rithm: Implementation and theory , 1977 .
[4] Geoffrey E. Hinton,et al. Experiments on Learning by Back Propagation. , 1986 .
[5] K. Chu. The solution of the matrix equations AXB−CXD=E AND (YA−DZ,YC−BZ)=(E,F) , 1987 .
[6] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .
[7] John E. Moody,et al. Note on Learning Rate Schedules for Stochastic Optimization , 1990, NIPS.
[8] Victor Y. Pan,et al. An Improved Newton Iteration for the Generalized Inverse of a Matrix, with Applications , 1991, SIAM J. Sci. Comput..
[9] Alan J. Laub,et al. Solution of the Sylvester matrix equation AXBT + CXDT = E , 1992, TOMS.
[10] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[11] M. Pourahmadi. Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation , 1999 .
[12] Stephen J. Wright,et al. Numerical Optimization , 2018, Fundamental Statistical Inference.
[13] M. Rattray,et al. MATRIX MOMENTUM FOR PRACTICAL NATURAL GRADIENT LEARNING , 1999 .
[14] Kenji Fukumizu,et al. Adaptive natural gradient learning algorithms for various stochastic models , 2000, Neural Networks.
[15] Shun-ichi Amari,et al. Methods of information geometry , 2000 .
[16] Tom Heskes,et al. On Natural Learning and Pruning in Multilayered Perceptrons , 2000, Neural Computation.
[17] C. Loan. The ubiquitous Kronecker product , 2000 .
[18] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[19] Ren-Cang Li. 05-01 Sharpness in Rates of Convergence For CG and Symmetric Lanczos Methods , 2005 .
[20] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[21] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .
[22] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.
[23] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[24] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.
[25] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[26] Ren-Cang Li,et al. Sharpness in rates of convergence for the symmetric Lanczos method , 2010, Math. Comput..
[27] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[28] Nando de Freitas,et al. A tutorial on stochastic approximation algorithms for training Restricted Boltzmann Machines and Deep Belief Nets , 2010, 2010 Information Theory and Applications Workshop (ITA).
[29] M. Pourahmadi. Covariance Estimation: The GLM and Regularization Perspectives , 2011, 1202.1661.
[30] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[31] Nicol N. Schraudolph,et al. Centering Neural Network Gradient Factors , 1996, Neural Networks: Tricks of the Trade.
[32] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[33] Daniel Povey,et al. Krylov Subspace Descent for Deep Learning , 2011, AISTATS.
[34] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[35] Tapani Raiko,et al. Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.
[36] Mark W. Schmidt,et al. Hybrid Deterministic-Stochastic Methods for Data Fitting , 2011, SIAM J. Sci. Comput..
[37] Jorge Nocedal,et al. Sample size selection in optimization methods for machine learning , 2012, Math. Program..
[38] Ilya Sutskever,et al. Estimating the Hessian by Back-propagating Curvature , 2012, ICML.
[39] Ilya Sutskever,et al. Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.
[40] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[41] Yann Ollivier,et al. Riemannian metrics for neural networks , 2013, ArXiv.
[42] Yann Ollivier,et al. Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.
[43] Ryan Kiros,et al. Training Neural Networks with Stochastic Hessian-Free Optimization , 2013, ICLR.
[44] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[45] Tapani Raiko,et al. Pushing Stochastic Gradient towards Second-Order Methods -- Backpropagation Learning with Transformations in Nonlinearities , 2013, ICLR.
[46] Hermann Ney,et al. Mean-normalized stochastic gradient for large-scale deep learning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[47] James Martens,et al. On the Expressive Efficiency of Sum Product Networks , 2014, ArXiv.
[48] Sanjeev Khudanpur,et al. Parallel training of DNNs with Natural Gradient and Parameter Averaging , 2014 .
[49] James Martens,et al. New perspectives on the natural gradient method , 2014, ArXiv.
[50] W. Gibson. Solution of Matrix Equations , 2014, The Method of Moments in Electromagnetics.
[51] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[52] Ruslan Salakhutdinov,et al. Scaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix , 2015, ICML.
[53] Valeria Simoncini,et al. Computational Methods for Linear Matrix Equations , 2016, SIAM Rev..
[54] Anne Auger,et al. Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..