论文信息 - Distributed Second-Order Optimization using Kronecker-Factored Approximations - 字舞流文

Distributed Second-Order Optimization using Kronecker-Factored Approximations

Roger B. Grosse | Jimmy Ba | James Martens | Jimmy Ba | James Martens | R. Grosse

[1] Frank E. Curtis,et al. A Self-Correcting Variable-Metric Algorithm for Stochastic Optimization , 2016, ICML.

[2] Michael I. Jordan,et al. A Linearly-Convergent Stochastic L-BFGS Algorithm , 2015, AISTATS.

[3] Minhyung Cho,et al. Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks , 2015, NIPS.

[4] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[6] Sanjeev Khudanpur,et al. Parallel training of DNNs with Natural Gradient and Parameter Averaging , 2014 .

[7] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.

[8] Daniel Povey,et al. Krylov Subspace Descent for Deep Learning , 2011, AISTATS.

[9] Nicol N. Schraudolph,et al. Centering Neural Network Gradient Factors , 1996, Neural Networks: Tricks of the Trade.

[10] Ilya Sutskever,et al. Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.

[11] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[12] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[13] Patrick Gallinari,et al. SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[14] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[15] Tom Heskes,et al. On Natural Learning and Pruning in Multilayered Perceptrons , 2000, Neural Computation.

[16] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .