论文信息 - A Hessian-Free Gradient Flow (HFGF) method for the optimisation of deep learning neural networks

A Hessian-Free Gradient Flow (HFGF) method for the optimisation of deep learning neural networks

Abstract This paper presents a novel optimisation method, termed Hessian-free Gradient Flow, for the optimisation of deep neural networks. The algorithm entails the design characteristics of the Truncated Newton, Conjugate Gradient and Gradient Flow method. It employs a finite difference approximation scheme to make the algorithm Hessian-free and makes use of Armijo conditions to determine the descent condition. The method is first tested on standard testing functions with a high optimisation model dimensionality. Performance on the testing functions has demonstrated the potential of the algorithm to be applied to large-scale optimisation problems. The algorithm is then tested on classification and regression tasks using real-world datasets. Comparable performance to conventional optimisers has been obtained in both cases.

[1] Ahmed H. Tewfik,et al. Speech Recognition with No Speech or with Noisy Speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Erik Cambria,et al. Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[3] Byunghan Lee,et al. Deep learning in bioinformatics , 2016, Briefings Bioinform..

[4] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.

[5] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[6] Jorge Nocedal,et al. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8] Xiaoxia Wu,et al. L ] 1 0 A pr 2 01 9 AdaGrad-Norm convergence over nonconvex landscapes AdaGrad stepsizes : sharp convergence over nonconvex landscapes , from any initialization , 2019 .

[9] Matthias Hein,et al. Variants of RMSProp and Adagrad with Logarithmic Regret Bounds , 2017, ICML.

[10] Etienne Perot,et al. Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[11] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[12] Jorge Nocedal,et al. On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[13] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[14] Xin-She Yang,et al. A literature survey of benchmark functions for global optimisation problems , 2013, Int. J. Math. Model. Numer. Optimisation.

[15] Hui Yan,et al. Study on Deep Unsupervised Learning Optimization Algorithm Based on Cloud Computing , 2019, 2019 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS).

[16] Antje Baer,et al. State Of The Art In Global Optimization Computational Methods And Applications , 2016 .

[17] Max A. Viergever,et al. Adaptive Stochastic Gradient Descent Optimisation for Image Registration , 2009, International Journal of Computer Vision.

[18] C. Botsaris. Differential gradient methods , 1978 .

[19] Shankar Krishnan,et al. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density , 2019, ICML.

[20] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[21] D. Draper,et al. Stochastic Optimization: a Review , 2002 .

[22] Francesco Orabona,et al. Scale-Free Algorithms for Online Linear Optimization , 2015, ALT.

[23] R. Dembo,et al. INEXACT NEWTON METHODS , 1982 .

[24] Hui Li,et al. Computer vision and deep learning–based data anomaly detection method for structural health monitoring , 2019 .

[25] M. Bartholomew-Biggs,et al. Some effective methods for unconstrained optimization based on the solution of systems of ordinary differential equations , 1989 .

[26] Shiqian Ma,et al. Barzilai-Borwein Step Size for Stochastic Gradient Descent , 2016, NIPS.

[27] Francis Bach,et al. AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods , 2017, ArXiv.

[28] Roberto Caldelli,et al. Adversarial image detection in deep neural networks , 2018, Multimedia Tools and Applications.

[29] Alexander J. Smola,et al. Fast incremental method for smooth nonconvex optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[30] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[31] Faruk Diblen,et al. A Stochastic LBFGS Algorithm for Radio Interferometric Calibration , 2019, 2019 IEEE Data Science Workshop (DSW).

[32] Kang C. Jea,et al. Generalized conjugate-gradient acceleration of nonsymmetrizable iterative methods , 1980 .

[33] H. Robbins. A Stochastic Approximation Method , 1951 .

[34] P. Pardalos,et al. State of the art in global optimization: computational methods and applications , 1996 .

[35] Michael I. Jordan,et al. Stochastic Gradient Descent Escapes Saddle Points Efficiently , 2019, ArXiv.

[36] Massimiliano Pontil,et al. Learning-to-Learn Stochastic Gradient Descent with Biased Regularization , 2019, ICML.