Discrete utterance speech recognition without time alignment

The results of a new method are presented for discrete utterance speech recognition. The method is based on rate-distortion speech coding (speech coding by vector quantization), minimum cross-entropy pattern classification, and information-theoretic spectral distortion measures. Separate vector quantization code books are designed from training sequences for each word in the recognition vocabulary. Inputs from outside the training sequence are classified by performing vector quantization and finding the code book that achieves the lowest average distortion per speech frame. The new method obviates time alignment. It achieves 99 percent accuracy for speaker-dependent recognition of a 20 -word vocabulary that includes the ten digits, with higher accuracy for recognition of the digit subset. For speaker-independent recognition, the method achieves 88 percent accuracy for the 20 -word vocabulary and 95 percent for the digit subset. Background of the method, detailed empirical results, and an analysis of computational requirements are presented.

[1]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[4]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[5]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[6]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[7]  R. Gray,et al.  Distortion measures for speech processing , 1980 .

[8]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[9]  Aaron E. Rosenberg,et al.  Performance tradeoffs in dynamic time warping algorithms for isolated word recognition , 1980 .

[10]  Rodney W. Johnson,et al.  Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.

[11]  R. Gray,et al.  Speech coding based upon vector quantization , 1980, ICASSP.

[12]  G. R. Doddington,et al.  Computers: Speech recognition: Turning theory to practice: New ICs have brought the requisite computer power to speech technology; an evaluation of equipment shows where it stands today , 1981, IEEE Spectrum.

[13]  L.R. Rabiner,et al.  Interpolation and decimation of digital signals—A tutorial review , 1981, Proceedings of the IEEE.

[14]  L. Rabiner,et al.  Isolated and Connected Word Recognition - Theory and Selected Applications , 1981, IEEE Transactions on Communications.

[15]  J. Shore Minimum cross-entropy spectral analysis , 1981 .

[16]  R. Johnson,et al.  Properties of cross-entropy minimization , 1981, IEEE Trans. Inf. Theory.

[17]  Robert M. Gray,et al.  Rate-distortion speech coding with a minimum discrimination information distortion measure , 1981, IEEE Trans. Inf. Theory.

[18]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[19]  Carlos Rivera,et al.  Discrete utterance recognition based upon source coding techniques , 1982, ICASSP.

[20]  Lawrence R. Rabiner,et al.  An adaptive, ordered, graph search technique for dynamic time warping for isolated word recognition , 1982 .

[21]  S. Roucos,et al.  Segment quantization for very-low-rate speech coding , 1982, ICASSP.

[22]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[23]  John E. Shore,et al.  Discrete utterance speech recognition without time normalization , 1982, ICASSP.

[24]  Robert M. Gray,et al.  Minimum Cross-Entropy Pattern Classification and Cluster Analysis , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Robert M. Gray,et al.  A Multirate Voice Digitizer Based Upon Vector Quantization , 1982, IEEE Trans. Commun..

[26]  Rodney W. Johnson,et al.  Speech noise reduction by means of multi-signal minimum-cross-entropy spectral analysis , 1983, ICASSP.

[27]  John E. Shore,et al.  A generalization of isolated word recognition using vector quantization , 1983, ICASSP.

[28]  J. Shore,et al.  Minimum cross-entropy spectral analysis of multiple signals , 1983 .