Learning Temporal Dependencies in Connectionist Speech Recognition

Hybrid connectionist/HMM systems model time both using a Markov chain and through properties of a connectionist network. In this paper, we discuss the nature of the time dependence currently employed in our systems using recurrent networks (RNs) and feed-forward multi-layer perceptrons (MLPs). In particular, we introduce local recurrences into a MLP to produce an enhanced input representation. This is in the form of an adaptive gamma filter and incorporates an automatic approach for learning temporal dependencies. We have experimented on a speaker-independent phone recognition task using the TIMIT database. Results using the gamma filtered input representation have shown improvement over the baseline MLP system. Improvements have also been obtained through merging the baseline and gamma filter models.