论文信息 - Speaker-independent phone recognition using hidden Markov models

Speaker-independent phone recognition using hidden Markov models

Hidden Markov modeling is extended to speaker-independent phone recognition. Using multiple codebooks of various linear-predictive-coding (LPC) parameters and discrete hidden Markov models (HMMs) the authors obtain a speaker-independent phone recognition accuracy of 58.8-73.8% on the TIMIT database, depending on the type of acoustic and language models used. In comparison, the performance of expert spectrogram readers is only 69% without use of higher level knowledge. The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data. Since the results were evaluated on a standard database, they can be used as benchmarks to evaluate future systems. >

Hsiao-Wuen Hon | Kai-Fu Lee | Kai-Fu Lee | H. Hon

[1] Van Nostrand,et al. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[2] F. Jelinek,et al. Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[3] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .

[4] Ching Y. Suen,et al. New Systems and Architectures for Automatic Speech Recognition and Synthesis , 1987, NATO ASI Series.

[5] John Makhoul,et al. Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] L. R. Rabiner,et al. Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[7] R. Cole,et al. The C-MU phonetic classification system , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Kiyohiro Shikano,et al. Speaker adaptation through vector quantization , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] M. Nishimura,et al. Speaker adaptation for a hidden Markov model , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10] Sadaoki Furui,et al. Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[11] Lalit R. Bahl,et al. Experiments with the Tangora 20,000 word speech recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] Peter F. Brown,et al. The acoustic-modeling problem in automatic speech recognition , 1987 .

[13] W. Fisher,et al. An acoustic‐phonetic data base , 1987 .

[14] John Makhoul,et al. BYBLOS: The BBN continuous speech recognition system , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15] J. Haton. Knowledge-based and expert systems in automatic speech recognition , 1987 .

[16] Vishwa Gupta,et al. Integration of acoustic information in a large vocabulary word recognizer , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17] Kai-Fu Lee,et al. On large-vocabulary speaker-independent continuous speech recognition , 1988, Speech Commun..

[18] Victor W. Zue,et al. Some phonetic recognition experiments using artificial neural nets , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[19] Raj Reddy,et al. Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[20] Frank K. Soong,et al. High performance connected digit recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..