On the problem of local minima in recurrent neural networks

Many researchers have recently focused their efforts on devising efficient algorithms, mainly based on optimization schemes, for learning the weights of recurrent neural networks. As in the case of feedforward networks, however, these learning algorithms may get stuck in local minima during gradient descent, thus discovering sub-optimal solutions. This paper analyses the problem of optimal learning in recurrent networks by proposing conditions that guarantee local minima free error surfaces. An example is given that also shows the constructive role of the proposed theory in designing networks suitable for solving a given task. Moreover, a formal relationship between recurrent and static feedforward networks is established such that the examples of local minima for feedforward networks already known in the literature can be associated with analogous ones in recurrent networks.

[1]  Piero Cosi,et al.  Phonetically-based multi-layered neural networks for vowel classification , 1990, Speech Commun..

[2]  Luís B. Almeida,et al.  A learning rule for asynchronous perceptrons with feedback in a combinatorial environment , 1990 .

[3]  Norio Baba,et al.  A new approach for finding the global minimum of error function of neural networks , 1989, Neural Networks.

[4]  Eduardo D. Sontag,et al.  Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers , 1989, Complex Syst..

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  Sontag,et al.  Backpropagation separates when perceptrons do , 1989 .

[7]  E. K. Blum,et al.  Approximation of Boolean Functions by Sigmoidal Networks: Part I: XOR and Other Two-Variable Functions , 1989, Neural Computation.

[8]  Yoshua Bengio,et al.  The problem of learning long-term dependencies in recurrent networks , 1993, IEEE International Conference on Neural Networks.

[9]  Raymond L. Watrous,et al.  Complete gradient optimization of a recurrent network applied to /b/,/d/,/g/ discrimination , 1988 .

[10]  C. L. Giles,et al.  Inserting rules into recurrent neural networks , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[11]  Ah Chung Tsoi,et al.  FIR and IIR Synapses, a New Neural Network Architecture for Time Series Modeling , 1991, Neural Computation.

[12]  Xiao-Hu Yu,et al.  Can backpropagation error surface not have local minima , 1992, IEEE Trans. Neural Networks.

[13]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[14]  Raymond L. Watrous,et al.  Connected recognition with a recurrent network , 1990, Speech Commun..

[15]  Joel W. Burdick,et al.  Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks , 1993, IEEE International Conference on Neural Networks.

[16]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[17]  Alberto Tesi,et al.  On the Problem of Local Minima in Backpropagation , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[19]  Giovanni Soda,et al.  Local Feedback Multilayered Networks , 1992, Neural Computation.

[20]  P. Frasconi,et al.  Local Feedback Multi-Layered Networks , 1992 .

[21]  Yann LeCun,et al.  Generalization and network design strategies , 1989 .

[22]  YoungJu Choie,et al.  Local minima and back propagation , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[23]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[24]  Paolo Frasconi,et al.  Backpropagation for linearly-separable patterns: A detailed analysis , 1993, IEEE International Conference on Neural Networks.

[25]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[26]  Bruno Buchberger,et al.  What Is Symbolic Computation? , 1995, CP.

[27]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[28]  J. Slawny,et al.  Back propagation fails to separate where perceptrons succeed , 1989 .

[29]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[30]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[31]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[32]  Scott E. Fahlman,et al.  The Recurrent Cascade-Correlation Architecture , 1990, NIPS.