Isolated-word speech recognition using multisection vector quantization codebooks

A new approach to isolated-word speech recognition using vector quantization (VQ) is examined. In this approach, words are recognized by means of sequences of VQ codebooks, called multisection codebooks. A separate multisection codebook is designed for each word in the recognition vocabulary by dividing the word into equal-length sections and designing a standard VQ codebook for each section. Unknown words are classified by dividing them into corresponding sections, encoding them with the multisection codebooks, and finding the multisection codebook that yields the smallest average distortion. For speaker-independent recognition of the digits, this approach achieved a recognition accuracy of 98 percent. In addition, the approach achieved greater than 99 percent accuracy for speaker-dependent recognition of the digits with only one distortion computation per input frame per vocabulary word. The approach is described, detailed experimental results are presented and discussed, and computational requirements are analyzed.

[1]  Toby Berger,et al.  Rate distortion theory : a mathematical basis for data compression , 1971 .

[2]  L. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1974, The Bell System Technical Journal.

[3]  J. Baker,et al.  The DRAGON system--An overview , 1975 .

[4]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[5]  G. W. Hughes,et al.  Minimum Prediction Residual Principle Applied to Speech Recognition , 1975 .

[6]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[7]  Lawrence R. Rabiner,et al.  Speaker-independent isolated word recognition for a moderate size(54 word)vocabulary , 1979 .

[8]  R. Gray,et al.  Distortion measures for speech processing , 1980 .

[9]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[10]  Aaron E. Rosenberg,et al.  Performance tradeoffs in dynamic time warping algorithms for isolated word recognition , 1980 .

[11]  R. Gray,et al.  Speech coding based upon vector quantization , 1980, ICASSP.

[12]  G. R. Doddington,et al.  Computers: Speech recognition: Turning theory to practice: New ICs have brought the requisite computer power to speech technology; an evaluation of equipment shows where it stands today , 1981, IEEE Spectrum.

[13]  Robert M. Gray,et al.  Rate-distortion speech coding with a minimum discrimination information distortion measure , 1981, IEEE Trans. Inf. Theory.

[14]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[15]  Bhaskar Ramamurthi,et al.  Image coding using vector quantization , 1982, ICASSP.

[16]  Carlos Rivera,et al.  Discrete utterance recognition based upon source coding techniques , 1982, ICASSP.

[17]  Roberto Billi,et al.  Vector quantization and Markov source models applied to speech recognition , 1982, ICASSP.

[18]  A. Gray,et al.  Distortion performance of vector quantization for LPC voice coding , 1982 .

[19]  John E. Shore,et al.  Discrete utterance speech recognition without time normalization , 1982, ICASSP.

[20]  Robert M. Gray,et al.  A Multirate Voice Digitizer Based Upon Vector Quantization , 1982, IEEE Trans. Commun..

[21]  Stephen E. Levinson,et al.  On the use of hidden Markov models for speaker‐independent recognition of isolated words from a medium size vocabulary , 1983 .

[22]  L. R. Rabiner,et al.  The effects of selected signal processing techniques on the performance of a filter-bank-based isolated word recognizer , 1983, The Bell System Technical Journal.

[23]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[24]  Kiyohiro Shikano,et al.  Isolated word recognition using phoneme-like templates , 1983, ICASSP.

[25]  John E. Shore,et al.  A generalization of isolated word recognition using vector quantization , 1983, ICASSP.

[26]  John E. Shore,et al.  Discrete utterance speech recognition without time alignment , 1983, IEEE Trans. Inf. Theory.

[27]  Roberto Billi,et al.  Experimental comparison among data compression techniques in isolated word recognition , 1983, ICASSP.

[28]  L. R. Rabiner,et al.  On the use of hidden Markov models for speaker-independent recognition of isolated words from a medium-size vocabulary , 1984, AT&T Bell Laboratories Technical Journal.

[29]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[30]  John E. Shore,et al.  Parameter selection for isolated word recognition using vector quantization , 1984, ICASSP.

[31]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[32]  Robert M. Gray,et al.  An Algorithm for the Design of Labeled-Transition Finite-State Vector Quantizers , 1985, IEEE Trans. Commun..

[33]  John E. Shore,et al.  Speaker-dependent isolated word recognition using speaker-independent vector quantization codebooks augmented with speaker-specific data , 1985, IEEE Trans. Acoust. Speech Signal Process..