The development of an experimental discrete dictation recognizer

This paper describes an experimental real-time recognizer of isolated word dictation implemented at the IBM Thomas J. Watson Research Center, on a system of commercially available computers and array processors. The recognizer's intended use is creation of office memoranda. It is based on a 5000-word vocabulary. A specially designed workstation enables the user to correct and edit the transcribed speech. The paper outlines the self-organized, statistical approach underlying the basic algorithms of the recognizer. Results of several recognition experiments are then presented. The rest of the paper considers important issues in the future development of dictation recognizers, such as vocabulary selection, language model creation, and human factors.

[1]  F. Jelinek Fast sequential decoding algorithm using a stack , 1969 .

[2]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[3]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[4]  F. Jelinek,et al.  Perplexity—a measure of the difficulty of speech recognition tasks , 1977 .

[5]  Thomas M. Cover,et al.  A convergent gambling estimate of the entropy of English , 1978, IEEE Trans. Inf. Theory.

[6]  Stephen E. Levinson,et al.  Computing relative redundancy to measure grammatical constraint in speech recognition tasks , 1978, ICASSP.

[7]  Lalit R. Bahl,et al.  Continuous speech recognition with automatically selected acoustic prototypes obtained by either bootstrapping or clustering , 1981, ICASSP.

[8]  R. Gray,et al.  Vector quantization of speech and speech-like waveforms , 1982 .

[9]  John D. Gould,et al.  Composing letters with a simulated listening typewriter , 1982, CHI '82.

[10]  John D. Gould,et al.  Human factors challenges in creating a principal support office system—the speech filing system approach , 1983, TOIS.

[11]  N. Dixon,et al.  A hierarchical decision approach to large-vocabulary discrete utterance recognition , 1983 .

[12]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Victor Zue,et al.  A model of lexical access from partial phonetic information , 1984, ICASSP.

[14]  Frederick Jelinek,et al.  A real-time, isolated-word, speech recognition system for dictation transcription , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.