A Hessian-Free Gradient Flow (HFGF) method for the optimisation of deep learning neural networks

Abstract This paper presents a novel optimisation method, termed Hessian-free Gradient Flow, for the optimisation of deep neural networks. The algorithm entails the design characteristics of the Truncated Newton, Conjugate Gradient and Gradient Flow method. It employs a finite difference approximation scheme to make the algorithm Hessian-free and makes use of Armijo conditions to determine the descent condition. The method is first tested on standard testing functions with a high optimisation model dimensionality. Performance on the testing functions has demonstrated the potential of the algorithm to be applied to large-scale optimisation problems. The algorithm is then tested on classification and regression tasks using real-world datasets. Comparable performance to conventional optimisers has been obtained in both cases.

[1]  Ahmed H. Tewfik,et al.  Speech Recognition with No Speech or with Noisy Speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[3]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[4]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[5]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[6]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Xiaoxia Wu,et al.  L ] 1 0 A pr 2 01 9 AdaGrad-Norm convergence over nonconvex landscapes AdaGrad stepsizes : sharp convergence over nonconvex landscapes , from any initialization , 2019 .

[9]  Matthias Hein,et al.  Variants of RMSProp and Adagrad with Logarithmic Regret Bounds , 2017, ICML.

[10]  Etienne Perot,et al.  Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[11]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[12]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[13]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[14]  Xin-She Yang,et al.  A literature survey of benchmark functions for global optimisation problems , 2013, Int. J. Math. Model. Numer. Optimisation.

[15]  Hui Yan,et al.  Study on Deep Unsupervised Learning Optimization Algorithm Based on Cloud Computing , 2019, 2019 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS).

[16]  Antje Baer,et al.  State Of The Art In Global Optimization Computational Methods And Applications , 2016 .

[17]  Max A. Viergever,et al.  Adaptive Stochastic Gradient Descent Optimisation for Image Registration , 2009, International Journal of Computer Vision.

[18]  C. Botsaris Differential gradient methods , 1978 .

[19]  Shankar Krishnan,et al.  An Investigation into Neural Net Optimization via Hessian Eigenvalue Density , 2019, ICML.

[20]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[21]  D. Draper,et al.  Stochastic Optimization: a Review , 2002 .

[22]  Francesco Orabona,et al.  Scale-Free Algorithms for Online Linear Optimization , 2015, ALT.

[23]  R. Dembo,et al.  INEXACT NEWTON METHODS , 1982 .

[24]  Hui Li,et al.  Computer vision and deep learning–based data anomaly detection method for structural health monitoring , 2019 .

[25]  M. Bartholomew-Biggs,et al.  Some effective methods for unconstrained optimization based on the solution of systems of ordinary differential equations , 1989 .

[26]  Shiqian Ma,et al.  Barzilai-Borwein Step Size for Stochastic Gradient Descent , 2016, NIPS.

[27]  Francis Bach,et al.  AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods , 2017, ArXiv.

[28]  Roberto Caldelli,et al.  Adversarial image detection in deep neural networks , 2018, Multimedia Tools and Applications.

[29]  Alexander J. Smola,et al.  Fast incremental method for smooth nonconvex optimization , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[30]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[31]  Faruk Diblen,et al.  A Stochastic LBFGS Algorithm for Radio Interferometric Calibration , 2019, 2019 IEEE Data Science Workshop (DSW).

[32]  Kang C. Jea,et al.  Generalized conjugate-gradient acceleration of nonsymmetrizable iterative methods , 1980 .

[33]  H. Robbins A Stochastic Approximation Method , 1951 .

[34]  P. Pardalos,et al.  State of the art in global optimization: computational methods and applications , 1996 .

[35]  Michael I. Jordan,et al.  Stochastic Gradient Descent Escapes Saddle Points Efficiently , 2019, ArXiv.

[36]  Massimiliano Pontil,et al.  Learning-to-Learn Stochastic Gradient Descent with Biased Regularization , 2019, ICML.