Practical Gauss-Newton Optimisation for Deep Learning
暂无分享,去创建一个
David Barber | Hippolyt Ritter | Aleksandar Botev | D. Barber | Hippolyt Ritter | Aleksandar Botev | H. Ritter
[1] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[2] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[3] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.
[4] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[5] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[6] Colin Raffel,et al. Lasagne: First release. , 2015 .
[7] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[8] Robert Mansel Gower,et al. Higher-order reverse automatic differentiation with emphasis on the third-order , 2016, Math. Program..
[9] Benjamin Schrauwen,et al. Factoring Variations in Natural Images with Deep Gaussian Mixture Models , 2014, NIPS.
[10] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[11] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[12] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[13] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[14] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[15] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[16] Andy Harter,et al. Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.
[17] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[18] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.
[19] Marc'Aurelio Ranzato,et al. Learning Factored Representations in a Deep Mixture of Experts , 2013, ICLR.
[20] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[21] Heiga Zen,et al. Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.