A speaker-independent, syntax-directed, connected word recognition system based on hidden Markov models and level building

In the last several years, a wide variety of techniques have been developed which make practical the implementation and development of large networks for recognizing connected sequences of words. Included among these techniques are efficient and accurate speech modeling methods (e.g., vector quantization, hidden Markov models) and efficient, optimal network search procedures (i.e., level building). In this paper we show how to integrate these techniques to give a speaker-independent, syntax-directed, connected word recognition system which requires only a modest amount of computation, and whose performance is comparable to that of previous recognizers requiring an order of magnitude more computation. In particular, the recognizer we studied was an airlines information and reservation system using a 129 word vocabulary, and a deterministic syntax (grammar) with 144 states, 450 state transitions, and 21 final states, generating more than 6 × 109sentences. An evaluation of the system, using six talkers each speaking 51 test sentences, yielded a sentence accuracy of about 75 percent resulting from a word accuracy of about 93 percent, for an average speaking rate of about 210 words per minute.

[1]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2]  Lalit R. Bahl,et al.  Design of a linguistic statistical decoder for the recognition of continuous speech , 1975, IEEE Trans. Inf. Theory.

[3]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[4]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[5]  S. E. Levinson,et al.  The effects of syntactic analysis on word recognition accuracy , 1978, The Bell System Technical Journal.

[6]  Stephen E. Levinson,et al.  Computing relative redundancy to measure grammatical constraint in speech recognition tasks , 1978, ICASSP.

[7]  Aaron E. Rosenberg,et al.  A new system for continuous speech recognition - preliminary results , 1979, ICASSP.

[8]  Lalit R. Bahl,et al.  Recognition results for several experimental acoustic processors , 1979, ICASSP.

[9]  Aaron E. Rosenberg,et al.  Speaker independent recognition of isolated words using clustering techniques , 1979, ICASSP.

[10]  Stephen E. Levinson,et al.  A conversational-mode airline information and reservation system using speech input and output , 1979, The Bell System Technical Journal.

[11]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[12]  L. Rabiner,et al.  A simplified, robust training procedure for speaker trained, isolated word recognition systems , 1980 .

[13]  Bruce Lowerre,et al.  The Harpy speech understanding system , 1990 .

[14]  L. Rabiner,et al.  Speaker‐independent isolated word recognition using a 129‐word airline vocabulary , 1981 .

[15]  C. Myers,et al.  A level building dynamic time warping algorithm for connected word recognition , 1981 .

[16]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[17]  Roberto Billi,et al.  Vector quantization and Markov source models applied to speech recognition , 1982, ICASSP.

[18]  A. Gray,et al.  Distortion performance of vector quantization for LPC voice coding , 1982 .

[19]  Stephen E. Levinson,et al.  Speaker independent connected word recognition using a syntax-directed dynamic programming procedure , 1982 .

[20]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[21]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  L. R. Rabiner,et al.  On the use of hidden Markov models for speaker-independent recognition of isolated words from a medium-size vocabulary , 1984, AT&T Bell Laboratories Technical Journal.

[23]  L. R. Rabiner,et al.  A vector quantizer combining energy and LPC parameters and its application to isolated word recognition , 1984, AT&T Bell Laboratories Technical Journal.

[24]  L. R. Rabiner,et al.  On the application of energy contours to the recognition of connected word sequences , 1984, AT&T Bell Laboratories Technical Journal.