论文信息 - Continuous speech recognition from a phonetic transcription

Continuous speech recognition from a phonetic transcription

A widely accepted linguistic theory holds that speech recognition in humans proceeds from an intermediate representation of the acoustic signal in terms of a small number of phonetic symbols. A novel speech recognition system based on this theory in which the acoustic-to-phonetic mapping is accomplished by means of a particular form of hidden Markov model and is independent of lexical and syntactic constraint is described. Word recognition is then treated as a classical string-to-string editing problem which is solved with a two-level dynamic programming algorithm that accounts for lexical and syntactic structure. The system was tested on speaker-independent recognition of fluent speech from the 991-word DARPA resource management task, on which 76.6% word accuracy was achieved. In informal tests it was observed that the phonetic transcription can be resynthesized to provide a 100-bit/s vocoder with word intelligibility rates of approximately 75%.<<ETX>>

Stephen E. Levinson | A. Ljolje | L. G. Miller

[1] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[2] Stephen E. Levinson,et al. Syntactic analysis for large vocabulary speech recognition using a context-free covering grammar , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3] Michael J. Fischer,et al. The String-to-String Correction Problem , 1974, JACM.

[4] Aaron E. Rosenberg,et al. On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] Stephen E. Levinson,et al. Speaker Independent Phonetic Transcription of Fluent Speech for Large Vocabulary Speech Recognition , 1989, HLT.

[6] Chin-Hui Lee,et al. Acoustic Modeling of Subword Units for Large Vocabulary Speaker Independent Speech Recognition , 1989, HLT.

[7] Stephen E. Levinson,et al. Continuously variable duration hidden Markov models for speech analysis , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8] Mitch Weintraub,et al. SRI's DECIPHER System , 1989, HLT.

[9] Douglas B. Paul. The Lincoln Continuous Speech Recognition System: Recent Developments and Results , 1989, HLT.

[10] W. Woods,et al. Motivation and overview of SPEECHLIS: An experimental prototype for speech understanding research , 1975 .

[11] Victor Lesser,et al. Organization of the Hearsay II speech understanding system , 1975 .

[12] Yoh'ichi Tohkura,et al. A weighted cepstral distance measure for speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[13] Robert M. Gray,et al. Probability, Random Processes, And Ergodic Properties , 1987 .

[14] Andrej Ljolje,et al. Continuous Speech Recognition from Phonetic Transcription , 1989, HLT.

[15] Richard M. Schwartz,et al. The BBN BYBLOS Continuous Speech Recognition System , 1989, HLT.

[16] Biing-Hwang Juang,et al. On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[17] Stephen E. Levinson,et al. Continuous speech recognition by means of acoustic/ Phonetic classification obtained from a hidden Markov model , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18] Chin-Hui Lee,et al. Acoustic modeling for large vocabulary speech recognition , 1990 .

[19] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[20] G. Mercier,et al. The KEAL Speech Understanding System , 1980 .

[21] Lawrence R. Rabiner,et al. A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[22] L. F. Willems,et al. Measurement of pitch in speech: an implementation of Goldstein's theory of pitch perception. , 1982, The Journal of the Acoustical Society of America.

[23] J. Olive,et al. Text to speech—An overview , 1985 .

[24] Stephen E. Levinson,et al. Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[25] Van Nostrand,et al. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[26] Jay G. Wilpon,et al. A grammar compiler for connected speech recognition , 1991, IEEE Trans. Signal Process..

[27] F. Jelinek,et al. Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[28] Patti Price,et al. The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[29] Stephen E. Levinson,et al. Large vocabulary speech recognition using a hidden Markov model for acoustic/phonetic classification , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.