A Focused Backpropagation Algorithm for Temporal Pattern Recognition

Time is at the heart of many pattern recognition tasks (e.g., speech recognition). However, connectionist learning algorithms to date are not well-suited for dealing with time-varying input patterns. This chapter introduces a specialized connectionist architecture and corresponding specialization of the back-propagation learning algorithm that operates efficiently, both in computational time and space requirements, on temporal sequences. The key feature of the architecture is a layer of selfconnected hidden units that integrate their current value with the new input at each time step to construct a static representation of the temporal input sequence. This architecture avoids two deficiencies found in the back-propagation unfolding-intime procedure (Rumelhart, Hinton, & Williams, 1986) for handing sequence recognition tasks: first, it reduces the difficulty of temporal credit assignment by focusing the back-propagated error signal; second, it eliminates the need for a buffer to hold the input sequence and/or intermediate activity levels. The latter property is due to the fact that during the forward (activation) phase, incremental activity traces can be locally computed that hold all information necessary for back propagation in time. It is argued that this architecture should scale better than conventional recurrent architectures with respect to sequence length. The architecture has been used to implement a temporal version of Rumelhart and McClelland's (1986) verb past-tense model. The hidden units learn to behave something like Rumelhart and McClelland's "Wickelphones," a rich and flexible representation of temporal information.

[1]  Wayne A. Wickelgran Context-sensitive coding, associative memory, and serial order in (speech) behavior. , 1969 .

[2]  Joan L. Bybee,et al.  Rules and schemas in the development and use of the English past tense , 1982 .

[3]  Paul Smolensky,et al.  Schema Selection and Stochastic Inference in Modular Environments , 1983, AAAI.

[4]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[5]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[6]  Jeffrey L. Elman,et al.  Interactive processes in speech perception: the TRACE model , 1986 .

[7]  Mozer,et al.  Early parallel processing in reading: a connectionist approach. Technical report, April-November 1986 , 1986 .

[8]  English Text,et al.  Parallel Networks that Learn to Pronounce , 1987 .

[9]  Tad Hogg,et al.  A Dynamical Approach to Temporal Pattern Processing , 1987, NIPS.

[10]  J. Freyd Dynamic mental representations. , 1987, Psychological review.

[11]  A. Lapedes,et al.  Nonlinear Signal Processing Using Neural Networks , 1987 .

[12]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[13]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[14]  Donald A. Norman,et al.  The perception of multiple objects: a parallel, distributed processing approach , 1987 .

[15]  Yoshiro Miyata,et al.  The learning and planning of actions , 1988 .

[16]  D Zipser,et al.  Learning the hidden structure of speech. , 1988, The Journal of the Acoustical Society of America.

[17]  Ronald J. Williams,et al.  Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[18]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[19]  M. Gori,et al.  BPS: a learning algorithm for capturing the dynamic nature of speech , 1989, International 1989 Joint Conference on Neural Networks.

[20]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[21]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[22]  Michael C. Mozer Discovering Faithful `Wickelfeature' Representations in a Connectionist Network , 1990 .

[23]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[24]  L. B. Almeida A learning rule for asynchronous perceptrons with feedback in a combinatorial environment , 1990 .

[25]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[26]  Michael C. Mozer,et al.  Perception of multiple objects - a connectionist approach , 1991, Neural network modeling and connectionism.