Learning to Control Fast-weight Memories: an Alternative to Dynamic Recurrent Networks

Previous algorithms for supervised sequence learning are based on dynamic recurrent networks. This paper describes alternative gradient-based systems consisting of two feed-forward nets which learn to deal with temporal sequences by using fast weights: The rst net learns to produce context dependent weight changes for the second net whose weights may vary very quickly. The method oers a potential for STM storage e-ciency: A simple weight (instead of a full-BLOCKINedged unit) may be sucient for storing temporal information. Various learning methods are derived. Two experiments with unknown time delays illustrate the approach. One experiment shows how the system can be used for adaptive temporary variable binding. A training sequence p with n p discrete time steps (called an episode) consists of n p ordered pairs (x p (t); d p (t)) 2 R n xR m , 0 < t n p. At time t of episode p a learning system receives x p (t) as an input and produces the output y p (t). The goal of the learning system is to minimize ^ E = 1 2 X p X t X i (d p i (t) 0 y p i (t)) 2 ;