Two-dimensional multi-resolution analysis of speech signals and its application to speech recognition

This paper describes a novel approach of using multi-resolution analysis (MRA) for automatic speech recognition. Two-dimensional MRA is applied to the short-time log spectrum of speech signal to extract the slowly varying spectral envelope that contains the most important articulatory and phonetic information. After passing through a standard cepstral analysis process, the MRA features are used for speech recognition in the same way as conventional short-time features like MFCCs, PLPs, etc. Preliminary experiments on both clean connected speech and noisy telephone conversation speech show that the use of MRA cepstra results in a significant reduction in insertion error when compared with MFCCs.

[1]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  R. S. McGowan,et al.  Extracting dynamic parameters from speech movement data. , 1993, The Journal of the Acoustical Society of America.

[3]  Gunnar Karlsson,et al.  Theory of two-dimensional multirate filter banks , 1990, IEEE Trans. Acoust. Speech Signal Process..

[4]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[5]  Steve Young,et al.  The HTK book , 1995 .

[6]  Steven Greenberg,et al.  The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Hynek Hermansky,et al.  On properties of modulation spectrum for robust automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Michael Picheny,et al.  Speech recognition on Mandarin Call Home: a large-vocabulary, conversational, and telephone speech corpus , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Tan Lee,et al.  DEVELOPMENT OF CANTONESE SPOKEN LANGUAGE CORPORA FOR SPEECH APPLICATIONS , 1998 .

[10]  S. S. Stevens,et al.  The Relation of Pitch to Frequency: A Revised Scale , 1940 .

[11]  Misha Pavel,et al.  Intelligibility of speech with filtered time trajectories of spectral envelopes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .