暂无分享,去创建一个
[1] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[2] Geoffrey Zweig,et al. Recent advances in deep learning for speech research at Microsoft , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[3] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[4] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[5] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[6] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[7] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[8] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[9] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[10] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[12] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[13] Surya Ganguli,et al. Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods , 2013, ICML.
[14] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[15] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[16] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[17] Geoffrey E. Hinton. Reducing the Dimensionality of Data with Neural , 2008 .
[18] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[19] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[20] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[21] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[22] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[23] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[24] Christopher D. Manning,et al. Fast dropout training , 2013, ICML.
[25] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[26] Andrew W. Fitzgibbon,et al. A fast natural Newton method , 2010, ICML.