Hierarchical tandem feature extraction

We present a hierarchical architecture for tandem acoustic modeling. In the tandem acoustic modeling paradigm a Multi Layer Perceptron (MLP) is discriminatively trained to estimate phoneme posterior probabilities on a labeled database. The outputs of the MLP after nonlinear transformation and whitening are used as features in a Gaussian Mixture Model (GMM) based recognizer. In this paper we replace the large monolithic MLP with hierarchies of MLP experts. We apply this approach on Speech in Noisy Environments (SPINE 1) evaluation conducted by the Naval Research Laboratory (NRL). We observe a reduction in word error rate of 30% with context-independent models and 5% WER with context-dependent models relative to PLP features.

[1]  Juergen Fritsch,et al.  Modular Neural Networks for Speech Recognition. , 1996 .

[2]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[3]  Daniel P. W. Ellis,et al.  Tandem acoustic modeling in large-vocabulary recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[5]  Daniel P. W. Ellis,et al.  Feature extraction using non-linear transformation for robust speech recognition on the Aurora database , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  Hynek Hermansky,et al.  Discriminative MLPs in HMM-based recognition of speech in cellular telephony , 2000, INTERSPEECH.

[8]  Michael I. Jordan,et al.  Modular and hierarchical learning systems , 1998 .

[9]  Wolfgang Doster,et al.  A decision theoretic approach to hierarchical classifier design , 1984, Pattern Recognit..