Modulation-domain Kalman filtering for single-channel speech enhancement

In this paper, we investigate the modulation-domain Kalman filter (MDKF) and compare its performance with other time-domain and acoustic-domain speech enhancement methods. In contrast to previously reported modulation domain-enhancement methods based on fixed bandpass filtering, the MDKF is an adaptive and linear MMSE estimator that uses models of the temporal changes of the magnitude spectrum for both speech and noise. Also, because the Kalman filter is a joint magnitude and phase spectrum estimator, under non-stationarity assumptions, it is highly suited for modulation-domain processing, as phase information has been shown to play an important role in the modulation domain. We have found that the Kalman filter is better suited for processing in the modulation-domain, rather than in the time-domain, since the low order linear predictor is sufficient at modelling the dynamics of slow changes in the modulation domain, while being insufficient at modelling the long-term correlation speech information in the time domain. As a result, the MDKF method produces enhanced speech that has very minimal distortion and residual noise, in the ideal case. The results from objective experiments and blind subjective listening tests using the NOIZEUS corpus show that the MDKF (with clean speech parameters) outperforms all the acoustic and time-domain enhancement methods that were evaluated, including the time-domain Kalman filter with clean speech parameters. A practical MDKF that uses the MMSE-STSA method to enhance noisy speech in the acoustic domain prior to LPC analysis was also evaluated and showed promising results.

[1]  Juan Carlos,et al.  Review of "Discrete-Time Speech Signal Processing - Principles and Practice", by Thomas Quatieri, Prentice-Hall, 2001 , 2003 .

[2]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[3]  H. Hermansky,et al.  Syllable intelligibility for temporally filtered LPC cepstral trajectories. , 1999, The Journal of the Acoustical Society of America.

[4]  Kuldip K. Paliwal,et al.  A speech enhancement method based on Kalman filtering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[6]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[7]  Nima Mesgarani,et al.  Speech enhancement based on filtering the spectrotemporal modulations , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  Kuldip K. Paliwal,et al.  Suppressing the influence of additive noise on the Kalman gain for low residual noise speech enhancement , 2011, Speech Commun..

[9]  Kuldip K. Paliwal,et al.  Kalman fitler with phase spectrum compensation algorithm for speech enhancement , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Björn E. Ottersten,et al.  Kalman filtering for low distortion speech enhancement in mobile communication , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Steven Greenberg,et al.  Speech intelligibility derived from exceedingly sparse spectral information , 1998, ICSLP.

[12]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[13]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[14]  Hynek Hermansky,et al.  Speech enhancement based on temporal processing , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[15]  Jerry D. Gibson,et al.  Filtering of colored noise for speech enhancement and coding , 1991, IEEE Trans. Signal Process..

[16]  Steven Greenberg,et al.  The relation between speech intelligibility and the complex modulation spectrum , 2001, INTERSPEECH.

[17]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[18]  Jacob Benesty,et al.  New insights into the noise reduction Wiener filter , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series, with Engineering Applications , 1949 .

[20]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[21]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[22]  Chunjian Li,et al.  Non-Gaussian, Non-stationary and Nonlinear Signal Processing Methods - with Applications to Speech Processing and Channel Estimation , 2007 .

[23]  Wen-Rong Wu,et al.  Subband Kalman filtering for speech enhancement , 1998 .

[24]  Hynek Hermansky,et al.  On properties of modulation spectrum for robust automatic speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[25]  Ehud Weinstein,et al.  Iterative and sequential Kalman filter-based speech enhancement algorithms , 1998, IEEE Trans. Speech Audio Process..

[26]  Kuldip K. Paliwal,et al.  Single-channel speech enhancement using spectral subtraction in the short-time modulation domain , 2010, Speech Commun..

[27]  Kuldip K. Paliwal,et al.  Role of modulation magnitude and phase spectrum towards speech intelligibility , 2011, Speech Commun..

[28]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[29]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[30]  Nuggehally Sampath Jayant,et al.  LPC analysis/Synthesis from speech inputs containing quantizing noise or additive white noise , 1976 .

[31]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[32]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[33]  Kuldip K. Paliwal,et al.  Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement , 2008, INTERSPEECH.

[34]  W. Bastiaan Kleijn,et al.  Noise suppression based on extending a speech-dominated modulation band , 2007, INTERSPEECH.

[35]  Les E. Atlas,et al.  EURASIP Journal on Applied Signal Processing 2003:7, 668–675 c ○ 2003 Hindawi Publishing Corporation Joint Acoustic and Modulation Frequency , 2003 .