Improved phone recognition using Bayesian triphone models

A crucial issue in triphone based continuous speech recognition is the large number of models to be estimated against the limited availability of training data. This problem can be relieved by composing a triphone model from less context-dependent models. This paper introduces a new statistical framework, derived from the Bayesian principle, to perform such a composition. The potential power of this new framework is explored, both algorithmically and experimentally, by an implementation with hidden Markov modeling techniques. This implementation is applied to the recognition of the 39-phone set on the TIMIT database. The new model achieves 74.4% and 75.6% accuracy, respectively, on the core and complete test sets.

[1]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Kai-Fu Lee,et al.  Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition , 1990 .

[3]  X. D. Huang,et al.  Phoneme classification using semicontinuous hidden Markov models , 1992, IEEE Trans. Signal Process..

[4]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[5]  Jean-Luc Gauvain,et al.  High performance speaker-independent phone recognition using CDHMM , 1993, EUROSPEECH.

[6]  Steve J. Young,et al.  State clustering in hidden Markov model-based continuous speech recognition , 1994, Comput. Speech Lang..

[7]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[8]  Andrej Ljolje,et al.  High accuracy phone recognition using context clustering and quasi-triphonic models , 1994, Comput. Speech Lang..

[9]  Satoshi Takahashi,et al.  Four-level tied-structure for efficient representation of acoustic modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Ruxin Chen,et al.  Explicit modeling of coarticulation in a statistical speech recognizer , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11]  Li Deng,et al.  Transitional speech units and their representation by regressive Markov states: applications to speech recognition , 1996, IEEE Trans. Speech Audio Process..

[12]  B. Juang,et al.  Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .