Network-based isolated digit recognition using vector quantization

This paper describes a network-based approach to speaker-independent digit recognition. The digits are modeled by a pronunciation network whose arcs represent classes of acoustic-phonetic segments. Each arc is associated with a matcher for rating an input speech interval as an example of the corresponding segment class. The matchers are based on vector quantization of LPC spectra. Recognition involves finding a minimum quantization distortion path through the network by dynamic programming. The system has been evaluated in an extensive series of speaker-independent isolated digit (one-nine, oh and zero) recognition experiments using a 225-talker. multidialect database developed by Texas Instruments (TI). The best recognizer configurations achieved accuracies of 97-99 percent on the TI database.

[1]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[2]  R. Gray,et al.  Speech coding based upon vector quantization , 1980, ICASSP.

[3]  John E. Shore,et al.  Discrete utterance speech recognition without time alignment , 1983, IEEE Trans. Inf. Theory.

[4]  Victor Zue,et al.  Application of allophonic and lexical constraints in continuous digit recognition , 1984, ICASSP.

[5]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[6]  Victor Zue,et al.  Properties of large lexicons: Implications for advanced isolated word recognition systems , 1982, ICASSP.

[7]  Michael D. Brown,et al.  An algorithm for connected word recognition , 1982, ICASSP.

[8]  M. Bush,et al.  Network-based connected digit recognition using vector quantization , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  K. L. Shipley,et al.  Evaluation of an isolated word recognizer in talker-dependent and talker-independent modes using a large telephone-band data base , 1984, ICASSP.

[10]  M. Tomlinson,et al.  The discriminative network: A mechanism for focusing recognition in whole-word pattern matching , 1983, ICASSP.

[11]  G. Kopec,et al.  The integrated signal processing system ISP , 1984, ICASSP.

[12]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[13]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[14]  J. A. Naylor,et al.  Effect of vector quantization on a continuous speech recognition system , 1984, ICASSP.

[15]  G. E. Peterson,et al.  Duration of Syllable Nuclei in English , 1960 .

[16]  Aaron E. Rosenberg,et al.  Speaker-independent recognition of isolated words using clustering techniques , 1979 .

[17]  D. Burton,et al.  Isolated-word speech recognition using multisection vector quantization codebooks , 1984, IEEE Trans. Acoust. Speech Signal Process..

[18]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[19]  Lawrence R. Rabiner,et al.  On the application of embedded digit training to speaker independent connected digit recognition , 1984 .

[20]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[21]  L. Rabiner,et al.  Isolated and Connected Word Recognition - Theory and Selected Applications , 1981, IEEE Transactions on Communications.