LDA based feature estimation methods for LVCSR

Features that model temporal aspects of phonemes are important in speech recognition. One method is to use linear discriminant analysis (LDA) to find discriminative features from a spectrotemporal input formed by concatenating consecutive frames of short-time spectrum features. Others use e.g. neural networks to process longer span spectral segments to improve recognition accuracy. Still the most widely used method for including temporal cues is to augment the short-time spectral features with simple time derivatives. In this paper a new feature estimation method based on pairwise linear discriminants is presented. We compare it and some of its variants to traditional MFCC features and to LDA estimated features in a large vocabulary continuous speech recognition (LVCSR) task. The features obtained with the new estimation method show significant improvements in recognition accuracy over MFCC and LDA features.

[1]  Sarel van Vuuren,et al.  Relevance of time-frequency features for phonetic and speaker-channel classification , 2000, Speech Commun..

[2]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[3]  Hermann Ney,et al.  Experiments with linear feature extraction in speech recognition , 1995, EUROSPEECH.

[4]  Hynek Hermansky,et al.  A study of two dimensional linear discriminants for ASR , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Frantisek Grézl,et al.  Improved MLP structures for data-driven feature extraction for ASR , 2005, INTERSPEECH.

[6]  Lalit R. Bahl,et al.  Speech recognition with continuous-parameter hidden Markov models , 1987, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[7]  George Saon,et al.  Maximum likelihood discriminant feature spaces , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Lalit R. Bahl,et al.  Speech recognition with continuous-parameter hidden Markov models , 1987 .

[9]  Panu Somervuo,et al.  Feature transformations and combinations for improving ASR performance , 2003, INTERSPEECH.

[10]  Vesa Siivola,et al.  Growing an n-gram language model , 2005, INTERSPEECH.

[11]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Janne Pylkkönen New pruning criteria for efficient decoding , 2005, INTERSPEECH.