Continuous history compression

Neural networks have proven poor at learning the structure in complex and extended temporal sequences in which contingencies among elements can span long time lags. The principle of history compression 18] provides a means of transforming long sequences with redundant information into equivalent shorter sequences; the shorter sequences are more easily manipulated and learned by neural networks. The principle states that expected sequence elements can be removed from the sequence to form an equivalent, more compact sequence without loss of information. The principle was embodied in a neural net predictive architecture that attempted to anticipate the next element of a sequence given the previous elements. If the prediction was accurate, the next element was discarded; otherwise, it was passed on to a second network that processed the sequence in some fashion (e.g., recognition, classiication, autoencoding, etc.). As originally proposed, a binary judgement was made as to the predictability of each element. Here, we describe a contininuous version of history compression in which elements are discarded in a graded fashion dependent on their predictability, embodied by their (Shannon) information. We implement continuous history compression using a RAAM architecture, yielding a class of sequence learning algorithms that are both entirely local and still able to bridge long time lags between correlated events.

[1]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[2]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[3]  Anthony J. Robinson,et al.  Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.

[4]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[5]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[6]  Ronald J. Williams,et al.  Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[7]  Jürgen Schmidhuber,et al.  A local learning algorithm for dynamic feedforward and recurrent networks , 1990, Forschungsberichte, TU Munich.

[8]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[9]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[10]  Jürgen Schmidhuber,et al.  Recurrent networks adjusted by adaptive critics , 1990 .

[11]  Jürgen Schmidhuber,et al.  Learning Unambiguous Reduced Sequence Descriptions , 1991, NIPS.

[12]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[13]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[14]  Jürgen Schmidhuber,et al.  A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks , 1992, Neural Computation.

[15]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .