论文信息 - EA-CG: An Approximate Second-Order Method for Training Fully-Connected Neural Networks

EA-CG: An Approximate Second-Order Method for Training Fully-Connected Neural Networks

For training fully-connected neural networks (FCNNs), we propose a practical approximate second-order method including: 1) an approximation of the Hessian matrix and 2) a conjugate gradient (CG) based method. Our proposed approximate Hessian matrix is memory-efficient and can be applied to any FCNNs where the activation and criterion functions are twice differentiable. We devise a CG-based method incorporating one-rank approximation to derive Newton directions for training FCNNs, which significantly reduces both space and time complexity. This CG-based method can be employed to solve any linear equation where the coefficient matrix is Kronecker-factored, symmetric and positive definite. Empirical studies show the efficacy and efficiency of our proposed method.

[2] Nicol N. Schraudolph,et al. Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[3] John Wright,et al. Using negative curvature in solving nonlinear programs , 2017, Comput. Optim. Appl..

[4] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7] Charles V. Stewart,et al. Robust Parameter Estimation in Computer Vision , 1999, SIAM Rev..

[8] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[9] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[10] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[11] Edward Y. Chang,et al. Artificial Intelligence in XPRIZE DeepQ Tricorder , 2017, MMHealth@MM.

[12] David Barber,et al. Practical Gauss-Newton Optimisation for Deep Learning , 2017, ICML.

[13] Roger B. Grosse,et al. A Kronecker-factored approximate Fisher matrix for convolution layers , 2016, ICML.

[14] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[15] Richard Socher,et al. Block-diagonal Hessian-free Optimization for Training Neural Networks , 2017, ArXiv.

[16] Nassir Navab,et al. Robust Optimization for Deep Regression , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[18] Stuart E. Dreyfus,et al. Second-order stagewise backpropagation for Hessian-matrix analyses and investigation of negative curvature , 2008, Neural Networks.

[19] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[20] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .

[21] Kenji Fukumizu,et al. Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.

[22] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.