Speaker-independent phone recognition using hidden Markov models

Hidden Markov modeling is extended to speaker-independent phone recognition. Using multiple codebooks of various linear-predictive-coding (LPC) parameters and discrete hidden Markov models (HMMs) the authors obtain a speaker-independent phone recognition accuracy of 58.8-73.8% on the TIMIT database, depending on the type of acoustic and language models used. In comparison, the performance of expert spectrogram readers is only 69% without use of higher level knowledge. The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data. Since the results were evaluated on a standard database, they can be used as benchmarks to evaluate future systems. >

[1]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[2]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[3]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[4]  Ching Y. Suen,et al.  New Systems and Architectures for Automatic Speech Recognition and Synthesis , 1987, NATO ASI Series.

[5]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[7]  R. Cole,et al.  The C-MU phonetic classification system , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Kiyohiro Shikano,et al.  Speaker adaptation through vector quantization , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  M. Nishimura,et al.  Speaker adaptation for a hidden Markov model , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[11]  Lalit R. Bahl,et al.  Experiments with the Tangora 20,000 word speech recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Peter F. Brown,et al.  The acoustic-modeling problem in automatic speech recognition , 1987 .

[13]  W. Fisher,et al.  An acoustic‐phonetic data base , 1987 .

[14]  John Makhoul,et al.  BYBLOS: The BBN continuous speech recognition system , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  J. Haton Knowledge-based and expert systems in automatic speech recognition , 1987 .

[16]  Vishwa Gupta,et al.  Integration of acoustic information in a large vocabulary word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Kai-Fu Lee,et al.  On large-vocabulary speaker-independent continuous speech recognition , 1988, Speech Commun..

[18]  Victor W. Zue,et al.  Some phonetic recognition experiments using artificial neural nets , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[19]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[20]  Frank K. Soong,et al.  High performance connected digit recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..