Training Deep and Recurrent Networks with Hessian-Free Optimization
暂无分享,去创建一个
[1] M. Hestenes,et al. Methods of conjugate gradients for solving linear systems , 1952 .
[2] R. E. Wengert,et al. A simple automatic derivative evaluation program , 1964, Commun. ACM.
[3] Jorge J. Moré,et al. The Levenberg-Marquardt algo-rithm: Implementation and theory , 1977 .
[4] Philippe L. Toint,et al. Towards an efficient sparsity exploiting newton method for minimization , 1981 .
[5] T. Steihaug. The Conjugate Gradient Method and Trust Regions in Large Scale Optimization , 1983 .
[6] Jorge J. Moré,et al. Computing a Trust Region Step , 1983 .
[7] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[8] S. Nash. Newton-Type Minimization via the Lanczos Method , 1984 .
[9] Yann LeCun,et al. Improving the convergence of back-propagation learning with second-order methods , 1989 .
[10] Geoffrey E. Hinton,et al. Proceedings of the 1988 Connectionist Models Summer School , 1989 .
[11] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .
[12] Chris Bishop,et al. Exact Calculation of the Hessian Matrix for the Multilayer Perceptron , 1992, Neural Computation.
[13] Dianne P. O'Leary,et al. The Use of the L-Curve in the Regularization of Discrete Ill-Posed Problems , 1993, SIAM J. Sci. Comput..
[14] J. Shewchuk. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .
[15] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[16] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[18] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[19] Nicholas I. M. Gould,et al. Solving the Trust-Region Subproblem using the Lanczos Method , 1999, SIAM J. Optim..
[20] Ya-Xiang Yuan,et al. On the truncated conjugate gradient method , 2000, Math. Program..
[21] S. Nash. A survey of truncated-Newton methods , 2000 .
[22] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[23] J. van Leeuwen,et al. Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.
[24] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.
[25] Harald Haas,et al. Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.
[26] Yoshua Bengio,et al. Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.
[27] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[28] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[29] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[30] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[31] Razvan Pascanu,et al. Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.
[32] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[33] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.
[34] Jorge Nocedal,et al. On the Use of Stochastic Hessian Information in Optimization Methods for Machine Learning , 2011, SIAM J. Optim..
[35] Farhan Feroz,et al. BAMBI: blind accelerated multimodal Bayesian inference , 2011, 1110.2997.
[36] O. Chapelle. Improved Preconditioner for Hessian Free Optimization , 2011 .
[37] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.
[38] Yoshua Bengio,et al. Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.
[39] Daniel Povey,et al. Krylov Subspace Descent for Deep Learning , 2011, AISTATS.
[40] Grgoire Montavon,et al. Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.
[41] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[42] Tara N. Sainath,et al. Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization , 2012, INTERSPEECH.
[43] Jorge Nocedal,et al. Sample size selection in optimization methods for machine learning , 2012, Math. Program..
[44] Ilya Sutskever,et al. Estimating the Hessian by Back-propagating Curvature , 2012, ICML.
[45] Tara N. Sainath,et al. Accelerating Hessian-free optimization for Deep Neural Networks by implicit preconditioning and sampling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[46] Tara N. Sainath,et al. Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[47] Tara N. Sainath,et al. Improvements to Deep Convolutional Neural Networks for LVCSR , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[48] Razvan Pascanu,et al. Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines , 2013, ICLR.
[49] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[50] Ryan Kiros,et al. Training Neural Networks with Stochastic Hessian-Free Optimization , 2013, ICLR.
[51] Farhan Feroz,et al. SKYNET: an efficient and robust neural network training tool for machine learning in astronomy , 2013, ArXiv.
[52] Razvan Pascanu,et al. Advances in optimizing recurrent networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.