Using More Informative Posterior Probabilities for Speech Recognition

In this paper, we present initial investigations towards boosting posterior probability based speech recognition systems by estimating more informative posteriors taking into account acoustic context (e.g., the whole utterance), as well as possible prior information (such as phonetic and lexical knowledge). These posteriors are estimated based on HMM state posterior probability definition (typically used in standard HMMs training). This approach provides a new, principled, theoretical framework for hierarchical estimation/use of more informative posteriors integrating appropriate context and prior knowledge. In the present work, we used the resulting posteriors as local scores for decoding. On the OGI numbers database, this resulted in significant performance improvement, compared to using MLP estimated posteriors for decoding (hybrid HMM/ANN approach) for clean and more specially for noisy speech. The system is also shown to be much less sensitive to tuning factors (such as phone deletion penalty, language model scaling) compared to the standard HMM/ANN and HMM/GMM systems, thus practically it does not need to be tuned to achieve the best possible performance

[1]  D. Ellis,et al.  CONNECTIONIST FEATURE EXTRACTION FOR CONVENTIONAL HMM SYSTEMS , 1999 .

[2]  Eric Horvitz,et al.  Layered representations for learning and inferring office activity from multiple sensory channels , 2004, Comput. Vis. Image Underst..

[3]  Steve Renals,et al.  Efficient search using posterior phone probability estimates , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  R. Cole,et al.  TELEPHONE SPEECH CORPUS DEVELOPMENT AT CSLU , 1998 .

[5]  Steve Young,et al.  The HTK book , 1995 .

[6]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  Lawrence R. Rabiner,et al.  A tutorial on Hidden Markov Models , 1986 .

[8]  Samy Bengio,et al.  Hierarchical Multi-stream Posterior Based Speech Recognition System , 2005, MLMI.

[9]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[10]  Samy Bengio,et al.  Developing and enhancing posterior based speech recognition systems , 2005, INTERSPEECH.

[11]  Sherif Abdou,et al.  Beam search pruning in speech recognition using a posterior probability-based confidence measure , 2004, Speech Commun..

[12]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[13]  Samy Bengio,et al.  Towards using hierarchical posteriors for flexible automatic speech recognition systems , 2004 .

[14]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[15]  Hervé Bourlard,et al.  Improving posterior based confidence measures in hybrid HMM/ANN speech recognition systems , 1998, ICSLP.