论文信息 - Training Recurrent Networks by Evolino

Training Recurrent Networks by Evolino

In recent years, gradient-based LSTM recurrent neural networks (RNNs) solved many previously RNN-unlearnable tasks. Sometimes, however, gradient information is of little use for training RNNs, due to numerous local minima. For such cases, we present a novel method: EVOlution of systems with LINear Outputs (Evolino). Evolino evolves weights to the nonlinear, hidden nodes of RNNs while computing optimal linear mappings from hidden state to output, using methods such as pseudo-inverse-based linear regression. If we instead use quadratic programming to maximize the margin, we obtain the first evolutionary recurrent support vector machines. We show that Evolino-based LSTM can solve tasks that Echo State nets (Jaeger, 2004a) cannot and achieves higher accuracy in certain continuous function generation tasks than conventional gradient descent RNNs, including gradient-based LSTM.

[1] Ingo Rechenberg,et al. Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[2] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[3] W. Vent,et al. Rechenberg, Ingo, Evolutionsstrategie — Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. 170 S. mit 36 Abb. Frommann‐Holzboog‐Verlag. Stuttgart 1973. Broschiert , 1975 .

[4] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5] L. Glass,et al. Oscillation and chaos in physiological control systems. , 1977, Science.

[6] Hans-Paul Schwefel,et al. Numerical optimization of computer models , 1981 .

[7] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[8] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[9] Peter M. Todd,et al. Designing Neural Networks using Genetic Algorithms , 1989, ICGA.

[10] Jürgen Schmidhuber,et al. Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem , 1990 .

[11] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[12] Eduardo Sontag,et al. Turing computability with neural nets , 1991 .

[13] Jürgen Schmidhuber,et al. A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[14] Xin Yao,et al. A review of evolutionary artificial neural networks , 1993, Int. J. Intell. Syst..

[15] Hans-Paul Schwefel,et al. Evolution and Optimum Seeking: The Sixth Generation , 1993 .

[16] Karl Sims,et al. Evolving virtual creatures , 1994, SIGGRAPH.

[17] Randall D. Beer,et al. Sequential Behavior and Learning in Evolved Dynamical Neural Networks , 1994, Adapt. Behav..

[18] Stefano Nolfi,et al. How to Evolve Autonomous Robots: Different Approaches in Evolutionary Robotics , 1994 .

[19] Barak A. Pearlmutter. Gradient calculations for dynamic recurrent neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[20] Stefano Nolfi,et al. Evolving Mobile Robots in Simulated and Real Environments , 1995, Artificial Life.

[21] Hans-Paul Schwefel,et al. Evolution and optimum seeking , 1995, Sixth-generation computer technology series.

[22] Jürgen Schmidhuber,et al. LSTM can Solve Hard Long Time Lag Problems , 1996, NIPS.

[23] Didier Guériot,et al. RBF neural network, basis functions and genetic algorithm , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[24] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[25] David E. Moriarty,et al. Symbiotic Evolution of Neural Networks in Sequential Decision Tasks , 1997 .

[26] F. Girosi,et al. Nonlinear prediction of chaotic time series using support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[27] Gunnar Rätsch,et al. Predicting Time Series with Support Vector Machines , 1997, ICANN.

[28] Jozef Baruník. Diploma thesis , 1999 .

[29] Risto Miikkulainen,et al. Solving Non-Markovian Control Tasks with Neuro-Evolution , 1999, IJCAI.

[30] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[31] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[32] J. Suykens,et al. Recurrent least squares support vector machines , 2000 .

[33] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[34] Shigeki Sagayama,et al. Dynamic Time-Alignment Kernel in Support Vector Machine , 2001, NIPS.

[35] John F. Kolen,et al. Evaluating Benchmark Problems by Random Guessing , 2001 .

[36] Jürgen Schmidhuber,et al. LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[37] Jürgen Schmidhuber,et al. Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[38] Jürgen Schmidhuber,et al. Learning Nonregular Languages: A Comparison of Simple Recurrent Networks and LSTM , 2002, Neural Computation.