Tau Net A neural network for modeling temporal variability

Abstract The ability to handle temporal variation is important when dealing with real-world dynamic signals. In many applications, inputs do not come in as fixed-rate sequences, but rather as signals with time scales that can vary from one instance to the next; thus, modeling dynamic signals requires not only the ability to recognize sequences but also the ability to handle temporal changes in the signal. This paper discusses ‘Tau Net’, a neural network for modeling dynamic signals, and its application to speech. In Tau Net, sequence learning is accomplished using a combination of prediction, recurrence and time-delay connections. Temporal variability is modeled by having adaptable time constants in the network, which are adjusted with respect to the prediction error. Adapting the time constants changes the time scale of the network, and the adapted value of the network's time constant provides a measure of temporal variation in the signal. Tau Net has been applied to several simple signals: sets of sine waves differing in frequency and in phase [1], a multidimensional signal representing the walking gait of children [2], and the energy contour of a simple speech utterance [3]. Tau Net has also been shown to work on a voicing distinction task using synthetic speech data [4]. In this paper, Tau Net is applied to two speaker-independent tasks, vowel recognition (of {/ae/, /iy/, /ux/}) and consonant recognition (of {/p/, /t/, /k/}) using speech data taken from the TIMIT database. It is shown that Tau Nets, trained on medium-rate tokens, achieved about the same performance as networks without time constants trained on tokens at all rates, and performed better than networks without time constants trained on medium-rate tokens. Our results demonstrate Tau Net's ability to identify vowels and consonants at variable speech rates by extrapolating to rates not represented in the training set.

[1]  Garrison W. Cottrell,et al.  A technique for adapting to speech rate , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[2]  Garrison W. Cottrell,et al.  A connectionist approach to rate adaptation , 1994, SGAR.

[3]  Barak A. Pearlmutter Learning State Space Trajectories in Recurrent Neural Networks , 1989, Neural Computation.

[4]  Abdelhamid Mellouk,et al.  A discriminative neural prediction system for speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  P. Ladefoged A course in phonetics , 1975 .

[6]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[7]  Raymond L. Watrous Phoneme Discrimination Using Connectionist Networks , 1993, Machine Learning: From Theory to Applications.

[8]  Hervé Bourlard,et al.  Connectionist probability estimators in HMM speech recognition , 1994, IEEE Trans. Speech Audio Process..

[9]  Alex Waibel,et al.  Large vocabulary recognition using linked predictive neural networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Garrison W. Cottrell,et al.  Learning in recurrent finite difference networks , 1995, Int. J. Neural Syst..

[11]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[12]  Bernie Mulgrew,et al.  IEEE Workshop on Neural Networks for Signal Processing , 1995 .

[13]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[14]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[15]  Alex Waibel,et al.  Continuous speech recognition using linked predictive neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[17]  Esther Levin Hidden control neural architecture modeling of nonlinear time varying systems and its applications , 1993, IEEE Trans. Neural Networks.

[18]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[19]  Ken-ichi Iso,et al.  Speaker-independent word recognition using a neural prediction model , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[20]  Barak A. Pearlmutter Learning state space trajectories in recurrent neural networks : a preliminary report. , 1988 .