论文信息 - Incorporating phonetic knowledge into a multi-stream HMM framework

Incorporating phonetic knowledge into a multi-stream HMM framework

This paper presents a technique for improving the performance of multi-stream HMMs in ASR systems. In this technique stream exponents of the multi-stream model are chosen with respect to the phonological content of the underlying states. Two distinctive feature sets namely MFCCs and formant-like features are used for investigating the potential of this technique. The experiments are performed on the AURORA database under the distributed speech recognition (DSR) framework. The proposed front-end constitutes an alternative to the DSR-XAFE (XAFE : eXtended Audio Front-End) provided by European Telecommunications Standards Institute. It is shown that the results obtained from the proposed method leads to improvement up to 10% in word accuracy relative to the word accuracy obtained form the multi-stream model with tied exponents and up to 35% relative improvement in word accuracy over the state-of-the-art MFCC-based system.

[1] Peter Beyerlein,et al. Discriminative model combination , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2] Philip N. Garner,et al. On the robust incorporation of formant features into hidden Markov models for automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3] Douglas D. O'Shaughnessy,et al. Comparative experiments to evaluate the use of auditory-based acoustic distinctive features and formant cues for automatic speech recognition using a multi-stream paradigm , 2002, INTERSPEECH.

[4] Philip N. Garner,et al. Using formant frequencies in speech recognition , 1997, EUROSPEECH.

[5] J. Flanagan. Speech Analysis, Synthesis and Perception , 1971 .

[6] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[7] Douglas D. O'Shaughnessy,et al. Speech communication : human and machine , 1987 .

[8] Douglas D. O'Shaughnessy,et al. Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Hervé Bourlard,et al. Multi-Stream Speech Recognition , 1996 .

[10] Gerasimos Potamianos,et al. Discriminative training of HMM stream exponents for audio-visual speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11] Hermann Ney,et al. Feature combination using linear discriminant analysis and its pitfalls , 2006, INTERSPEECH.

[12] Sadaoki Furui,et al. A stream-weight optimization method for multi-stream HMMs based on likelihood value normalization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..