Online Second Order Methods for Non-Convex Stochastic Optimizations
暂无分享,去创建一个
[1] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[3] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..
[4] Yoshua Bengio,et al. Equilibrated adaptive learning rates for non-convex optimization , 2015, NIPS.
[5] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[6] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[7] Xi-Lin Li,et al. Preconditioned Stochastic Gradient Descent , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[10] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[11] Jean-François Cardoso,et al. Equivariant adaptive source separation , 1996, IEEE Trans. Signal Process..
[12] Simon Günter,et al. A Stochastic Quasi-Newton Method for Online Convex Optimization , 2007, AISTATS.
[13] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[14] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[15] Jorge Nocedal,et al. A Stochastic Quasi-Newton Method for Large-Scale Optimization , 2014, SIAM J. Optim..
[16] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[17] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[18] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[19] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[20] Ilya Sutskever,et al. Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.
[21] Sanjeev Khudanpur,et al. Parallel training of DNNs with Natural Gradient and Parameter Averaging , 2014 .