Continuous speech recognition from a phonetic transcription

A widely accepted linguistic theory holds that speech recognition in humans proceeds from an intermediate representation of the acoustic signal in terms of a small number of phonetic symbols. A novel speech recognition system based on this theory in which the acoustic-to-phonetic mapping is accomplished by means of a particular form of hidden Markov model and is independent of lexical and syntactic constraint is described. Word recognition is then treated as a classical string-to-string editing problem which is solved with a two-level dynamic programming algorithm that accounts for lexical and syntactic structure. The system was tested on speaker-independent recognition of fluent speech from the 991-word DARPA resource management task, on which 76.6% word accuracy was achieved. In informal tests it was observed that the phonetic transcription can be resynthesized to provide a 100-bit/s vocoder with word intelligibility rates of approximately 75%.<<ETX>>

[1]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[2]  Stephen E. Levinson,et al.  Syntactic analysis for large vocabulary speech recognition using a context-free covering grammar , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[4]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Stephen E. Levinson,et al.  Speaker Independent Phonetic Transcription of Fluent Speech for Large Vocabulary Speech Recognition , 1989, HLT.

[6]  Chin-Hui Lee,et al.  Acoustic Modeling of Subword Units for Large Vocabulary Speaker Independent Speech Recognition , 1989, HLT.

[7]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for speech analysis , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Mitch Weintraub,et al.  SRI's DECIPHER System , 1989, HLT.

[9]  Douglas B. Paul The Lincoln Continuous Speech Recognition System: Recent Developments and Results , 1989, HLT.

[10]  W. Woods,et al.  Motivation and overview of SPEECHLIS: An experimental prototype for speech understanding research , 1975 .

[11]  Victor Lesser,et al.  Organization of the Hearsay II speech understanding system , 1975 .

[12]  Yoh'ichi Tohkura,et al.  A weighted cepstral distance measure for speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[13]  Robert M. Gray,et al.  Probability, Random Processes, And Ergodic Properties , 1987 .

[14]  Andrej Ljolje,et al.  Continuous Speech Recognition from Phonetic Transcription , 1989, HLT.

[15]  Richard M. Schwartz,et al.  The BBN BYBLOS Continuous Speech Recognition System , 1989, HLT.

[16]  Biing-Hwang Juang,et al.  On the use of bandpass liftering in speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[17]  Stephen E. Levinson,et al.  Continuous speech recognition by means of acoustic/ Phonetic classification obtained from a hidden Markov model , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Chin-Hui Lee,et al.  Acoustic modeling for large vocabulary speech recognition , 1990 .

[19]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[20]  G. Mercier,et al.  The KEAL Speech Understanding System , 1980 .

[21]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[22]  L. F. Willems,et al.  Measurement of pitch in speech: an implementation of Goldstein's theory of pitch perception. , 1982, The Journal of the Acoustical Society of America.

[23]  J. Olive,et al.  Text to speech—An overview , 1985 .

[24]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[25]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[26]  Jay G. Wilpon,et al.  A grammar compiler for connected speech recognition , 1991, IEEE Trans. Signal Process..

[27]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[28]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[29]  Stephen E. Levinson,et al.  Large vocabulary speech recognition using a hidden Markov model for acoustic/phonetic classification , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.