Speaker Independent Phonetic Transcription of Fluent Speech for Large Vocabulary Speech Recognition

Speaker independent phonetic transcription of fluent speech is performed using an ergodic continuously variable duration hidden Markov model (CVDHMM) to represent the acoustic, phonetic and phonotactic structure of speech. An important property of the model is that each of its fifty-one states is uniquely identified with a single phonetic unit. Thus, for any spoken utterance, a phonetic transcription is obtained from a dynamic programming (DP) procedure for finding the state sequence of maximum likelihood. A model has been constructed based on 4020 sentences from the TIMIT database. When tested on 180 different sentences from this database, phonetic accuracy was observed to be 56% with 9% insertions. A speaker dependent version of the model was also constructed. The transcription algorithm was then combined with lexical access and parsing routines to form a complete recognition system. When tested on sentences from the DARPA resource management task spoken over the local switched telephone network, phonetic accuracy of 64% with 8% insertions and word accuracy of 87% with 3% insertions was measured. This system is presently operating in an on-line mode over the local switched telephone network in less than ten times real time on an Alliant FX-80.

[1]  Hsiao-Wuen Hon,et al.  Large-vocabulary speaker-independent continuous speech recognition using HMM , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[2]  Stephen E. Levinson,et al.  Large vocabulary speech recognition using a hidden Markov model for acoustic/phonetic classification , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  P. Hanks,et al.  Collins dictionary of the English language , 1979 .

[4]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[5]  Stephen E. Levinson,et al.  Continuous speech recognition by means of acoustic/ Phonetic classification obtained from a hidden Markov model , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[7]  Chin-Hui Lee,et al.  Speaker‐independent recognition of the DARPA Naval Resource Management Task , 1989 .

[8]  John Makhoul,et al.  Continuous speech recognition results of the BYBLOS system on the DARPA 1000-word resource management database , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[9]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.