Attempting to reduce the vanishing gradient effect through a novel recurrent multiscale architecture

This paper proposes a possible solution to the vanishing gradient problem in recurrent neural networks, occurring when such networks are applied to solving tasks where detection of long term dependencies is required. The main idea consists of pre-processing the signal (a time series typically) through a discrete wavelet decomposition, in order to separate the short term information from the long term ones, and treating each scale by different recurrent neural networks. The partial results concerning all the sequences at diverse time/frequency resolutions are combined through an adaptive nonlinear structure in order to achieve the final goal. This new preprocessing based approach is distinct from the other one reported in literature to-date, as it tends to mitigate the effects of the problem under study avoiding relevant changing in network's architecture and learning techniques. The overall system (called recurrent multiscale network, RMN) is described and its performances tested through typical tasks namely the latching problem and time series prediction.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[3]  S. C. Kremer,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[4]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[5]  John F. Kolen,et al.  Insertion of Prior Knowledge , 2001 .

[6]  A.H. Tewfik,et al.  Correlation structure of the discrete wavelet coefficients of fractional Brownian motion , 1992, IEEE Trans. Inf. Theory.

[7]  S.M. Kogon,et al.  Efficient generation of long-memory signals using lattice structures , 1995, Conference Record of The Twenty-Ninth Asilomar Conference on Signals, Systems and Computers.

[8]  Kwok Yip Szeto,et al.  Rules extraction in short memory time series using genetic algorithms , 2001 .

[9]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[10]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[11]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[12]  Amir B. Geva,et al.  ScaleNet-multiscale neural-network architecture for time series prediction , 1998, IEEE Trans. Neural Networks.

[13]  Dominik R. Dersch,et al.  Multiresolution Forecasting for Futures Trading , 2001 .