Nonmonotone BFGS-trained recurrent neural networks for temporal sequence processing

In this paper we propose a nonmonotone approach to recurrent neural networks training for temporal sequence processing applications. This approach allows learning performance to deteriorate in some iterations, nevertheless the network’s performance is improved over time. A self-scaling BFGS is equipped with an adaptive nonmonotone technique that employs approximations of the Lipschitz constant and is tested on a set of sequence processing problems. Simulation results show that the proposed algorithm outperforms the BFGS as well as other methods previously applied to these sequences, providing an effective modification that is capable of training recurrent networks of various architectures.

[1]  M. Al-Baali Numerical Experience with a Class of Self-Scaling Quasi-Newton Algorithms , 1998 .

[2]  Kwok-Yan Lam,et al.  Global optimisation in neural network training , 2005, Neural Computing & Applications.

[3]  George D. Magoulas,et al.  Deterministic nonmonotone strategies for effective training of multilayer perceptrons , 2002, IEEE Trans. Neural Networks.

[4]  Rudy Setiono,et al.  Some n-bit parity problems are solvable by feedforward networks with less than n hidden units , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[5]  William C. Davidon,et al.  Variable Metric Method for Minimization , 1959, SIAM J. Optim..

[6]  Wenyu Sun,et al.  Global convergence of nonmonotone descent methods for unconstrained optimization problems , 2002 .

[7]  Hubert Cardot,et al.  Study of the Behavior of a New Boosting Algorithm for Recurrent Neural Networks , 2005, ICANN.

[8]  George W. Irwin,et al.  A Variable Memory Quasi-Newton Training Algorithm , 1999, Neural Processing Letters.

[9]  D. Luenberger,et al.  SELF-SCALING VARIABLE METRIC ( SSVM ) ALGORITHMS Part I : Criteria and Sufficient Conditions for Scaling a Class of Algorithms * t , 2007 .

[10]  Seppo Pulkkinen,et al.  A Review of Methods for Unconstrained Optimization: Theory, Implementation and Testing , 2008 .

[11]  Norah Moore,et al.  The Shape of Change , 1988 .

[12]  C. S. George Lee,et al.  Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems , 1996 .

[13]  J. Han GENERAL FORM OF STEPSIZE SELECTION RULES OF LINESEARCH AND RELEVANT ANALYSIS OF GLOBAL CONVERGENCE OF BFGS ALGOMTHM , 1995 .

[14]  R. Fletcher,et al.  A New Approach to Variable Metric Algorithms , 1970, Comput. J..

[15]  George D. Magoulas,et al.  A Personalised Interface for Web Directories Based on Cognitive Styles , 2004, User Interfaces for All.

[16]  George D. Magoulas,et al.  Globally convergent algorithms with local learning rates , 2002, IEEE Trans. Neural Networks.

[17]  Roelof K. Brouwer Training of a discrete recurrent neural network for sequence classification by using a helper FNN , 2005, Soft Comput..

[18]  J. Elman Connectionist models of cognitive development: where next? , 2005, Trends in Cognitive Sciences.

[19]  J. Elman,et al.  Rethinking Innateness: A Connectionist Perspective on Development , 1996 .

[20]  Vijanth S. Asirvadam,et al.  A memory optimal BFGS neural network training algorithm , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[21]  George D. Magoulas,et al.  Effective Backpropagation Training with Variable Stepsize , 1997, Neural Networks.

[22]  Edmund T. Rolls,et al.  Introduction to Connectionist Modelling of Cognitive Processes , 1998 .

[23]  Bhaskar D. Rao,et al.  On-line learning algorithms for locally recurrent neural networks , 1999, IEEE Trans. Neural Networks.

[24]  Fernando J. Von Zuben,et al.  Efficient Second-Order Learning Algorithms for Discrete-Time Recurrent Neural Networks , 1999 .

[25]  Stefan C. Kremer,et al.  Spatiotemporal Connectionist Networks: A Taxonomy and Review , 2001, Neural Computation.

[26]  P. Gordienko,et al.  Construction of efficient neural networks: Algorithms and tests , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[27]  James L. McClelland,et al.  Understanding normal and impaired word reading: computational principles in quasi-regular domains. , 1996, Psychological review.

[28]  Tariq Samad,et al.  Intelligent optimal control with dynamic neural networks , 2003, Neural Networks.

[29]  B. Sick,et al.  A comparison of first- and second-order training algorithms for dynamic neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[30]  Jorge Calera-Rubio,et al.  Online Symbolic-Sequence Prediction with Discrete-Time Recurrent Neural Networks , 2001, ICANN.

[31]  Pierre Baldi,et al.  Gradient descent learning algorithm overview: a general dynamical systems perspective , 1995, IEEE Trans. Neural Networks.

[32]  Robert J. Schalkoff,et al.  Artificial neural networks , 1997 .

[33]  Roger Fletcher,et al.  A Rapidly Convergent Descent Method for Minimization , 1963, Comput. J..

[34]  John F. Kolen,et al.  Dynamical Recurrent Networks for Sequential Data Processing , 1998, Hybrid Neural Systems.

[35]  Yasar Becerikli,et al.  Trajectory priming with dynamic fuzzy networks in nonlinear optimal control , 2004, IEEE Transactions on Neural Networks.

[36]  M. K. Soni,et al.  Artificial neural network based peak load forecasting using Levenberg-Marquardt and quasi-Newton methods , 2002 .

[37]  Stefan Wermter,et al.  Hybrid Neural Systems, revised papers from a workshop , 1998 .

[38]  G.W. Irwin,et al.  Memory efficient BFGS neural-network learning algorithms using MLP-network: a survey , 2004, Proceedings of the 2004 IEEE International Conference on Control Applications, 2004..

[39]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .

[40]  M. J. D. Powell,et al.  How bad are the BFGS and DFP methods when the objective function is quadratic? , 1986, Math. Program..

[41]  Lakhmi C. Jain,et al.  Recurrent Neural Networks: Design and Applications , 1999 .

[42]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 2. The New Algorithm , 1970 .

[43]  Hubert Cardot,et al.  Time Delay Learning by Gradient Descent in Recurrent Neural Networks , 2005, ICANN.

[44]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[45]  D. Goldfarb A family of variable-metric methods derived by variational means , 1970 .

[46]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[47]  Bernhard Sick,et al.  Fast and efficient second-order training of the dynamic neural network paradigm , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[48]  S. Grossberg Some Networks That Can Learn, Remember, and Reproduce any Number of Complicated Space-Time Patterns, I , 1969 .

[49]  D. Du,et al.  The Global Convergence of Self-Scaling BFGS Algorithm with Nonmonotone Line Search for Unconstrained Nonconvex Optimization Problems , 2007 .

[50]  Takeo Ishikawa,et al.  Optimization of electromagnetic devices using artificial neural network with quasi-Newton algorithm , 1996 .

[51]  Kei-wai Yeung Sequence processing with recurrent neural networks , 1994 .

[52]  Nazri Mohd Nawi,et al.  An Improved Learning Algorithm Based on The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method For Back Propagation Neural Networks , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[53]  Shmuel S. Oren,et al.  Self-scaling variable metric algorithms for unconstrained minimization , 1972 .

[54]  George D. Magoulas,et al.  Adaptive Nonmonotone Conjugate Gradient Training Algorithm for Recurrent Neural Networks , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[55]  William W. Hager,et al.  A Nonmonotone Line Search Technique and Its Application to Unconstrained Optimization , 2004, SIAM J. Optim..

[56]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[57]  Marco Gori,et al.  Adaptive Processing of Sequences and Data Structures , 1998, Lecture Notes in Computer Science.

[58]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[59]  Ronald J. Williams,et al.  Gradient-Based Learning Algorithms for Recurrent Networks , 1989 .

[60]  L. Grippo,et al.  A nonmonotone line search technique for Newton's method , 1986 .

[61]  Paul Kang-Hoh Phua,et al.  Parallel nonlinear optimization techniques for training neural networks , 2003, IEEE Trans. Neural Networks.

[62]  George D. Magoulas,et al.  Adaptive Algorithms for Neural Network Supervised Learning: a Deterministic Optimization Approach , 2006, Int. J. Bifurc. Chaos.

[63]  R. Fletcher Practical Methods of Optimization , 1988 .

[64]  Rudy Setiono,et al.  Use of a quasi-Newton method in a feedforward neural network construction algorithm , 1995, IEEE Trans. Neural Networks.

[65]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[66]  George D. Magoulas,et al.  Adaptive Self-scaling Non-monotone BFGS Training Algorithm for Recurrent Neural Networks , 2007, ICANN.

[67]  Larry Nazareth,et al.  A family of variable metric updates , 1977, Math. Program..

[68]  Andreas Stafylopatis,et al.  Training the random neural network using quasi-Newton methods , 2000, Eur. J. Oper. Res..

[69]  Jorge Nocedal,et al.  Analysis of a self-scaling quasi-Newton method , 1993, Math. Program..

[70]  D. Luenberger,et al.  Self-Scaling Variable Metric (SSVM) Algorithms , 1974 .