A theory of desirable properties for preprocessors for speech recognizers

The design of preprocessors, or front-ends, for a speech recognition system is considered to be critical to the overall performance of the system. A design methodology, which is inspired by and, in some ways, formulated from the theory of time-frequency analysis, is presented. Examples of specific desirable properties for preprocessors are presented, and preliminary results supporting the need for a frequency transform are summarized. The authors propose eight properties for speech preprocessors. The last property, that of frequency analysis, was experimentally confirmed for single short segments of speaker-independent vowels. It is concluded that frequency transforms are needed for the simple case of speaker-independent vowel recognition.<<ETX>>

[1]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[2]  Les Atlas,et al.  Range-Doppler processing with the cone kernel time-frequency representation , 1991, [1991] IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Conference Proceedings.

[3]  W. R. Bennett,et al.  The correlatograph: A machine for continuous display of short term correlation , 1953 .

[4]  William J. Williams,et al.  Reduced Interference Time-Frequency Distributions , 1992 .

[5]  L. Cohen,et al.  Time-frequency distributions-a review , 1989, Proc. IEEE.

[6]  J. Morlet,et al.  Wave propagation and sampling theory—Part II: Sampling theory and complex waves , 1982 .

[7]  J. Morlet,et al.  Wave propagation and sampling theory—Part I: Complex signal and scattering in multilayered media , 1982 .

[8]  Les Atlas,et al.  New properties to alleviate interference in time-frequency representations , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[10]  Robert J. Marks,et al.  The use of cone-shaped kernels for generalized time-frequency representations of nonstationary signals , 1990, IEEE Trans. Acoust. Speech Signal Process..

[11]  William J. Williams,et al.  Improved time-frequency representation of multicomponent signals using exponential kernels , 1989, IEEE Trans. Acoust. Speech Signal Process..

[12]  R. A. Cole,et al.  Speaker-independent vowel recognition: comparison of backpropagation and trained classification trees , 1990, Twenty-Third Annual Hawaii International Conference on System Sciences.