Modulation spectral filtering of speech

Recent auditory physiological evidence points to a modulation frequency dimension in the auditory cortex. This dimension exists jointly with the tonotopic acoustic frequency dimension. Thus, audition can be considered as a relatively slowly-varying two-dimensional representation, the “modulation spectrum,” where the first dimension is the well-known acoustic frequency and the second dimension is modulation frequency. We have recently developed a fully invertible analysis/synthesis approach for this modulation spectral transform. A general application of this approach is removal or modification of different modulation frequencies in audio or speech signals, which, for example, causes major changes in perceived dynamic character. A specific application of this modification is single-channel multipletalker separation.

[1]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[2]  Les E. Atlas,et al.  A non-uniform modulation transform for audio coding with increased time resolution , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  L. Zadeh,et al.  Frequency Analysis of Variable Networks , 1950, Proceedings of the IRE.

[4]  Kenji Okada,et al.  Using the modulation complex wavelet transform for feature extraction in automatic speech recognition , 2001, INTERSPEECH.

[5]  William A. Gardner,et al.  Statistical spectral analysis : a nonprobabilistic theory , 1986 .

[6]  T. Houtgast,et al.  The Modulation Transfer Function in Room Acoustics as a Predictor of Speech Intelligibility , 1973 .

[7]  W. Gardner Exploitation of spectral redundancy in cyclostationary signals , 1991, IEEE Signal Processing Magazine.

[8]  Kenji Okada,et al.  Using the modulation wavelet transform for feature extraction in automatic speech recognition , 2000, Interspeech.

[9]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[10]  Steven Greenberg,et al.  The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Les E. Atlas,et al.  Scalable and progressive audio codec , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  Te-Won Lee,et al.  Blind Separation of Delayed and Convolved Sources , 1996, NIPS.

[13]  Henrique S. Malvar,et al.  Signal processing with lapped transforms , 1992 .

[14]  John Princen,et al.  Analysis/Synthesis filter bank design based on time domain aliasing cancellation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[15]  Steven Greenberg,et al.  The relation between speech intelligibility and the complex modulation spectrum , 2001, INTERSPEECH.