Speech enhancement using fourth-order cumulants and optimum filters in the subband domain

A new method for speech enhancement using time-domain optimum filters and fourth-order cumulants (FOC) is proposed based on newly established properties of the FOC of speech signals. In the exploratory part of the paper, the analytical expression of the FOC of subbanded speech is derived assuming a sinusoidal model and up to two harmonics per band. Important properties about this cumulant are revealed and actual speech data is used to verify the derivations and the underlying model. In the application part of the work, speech enhancement is formulated as an estimation problem and the expression for the time-domain causal optimum filters is derived for a pth order system. The key idea is to use the FOC of the noisy speech to estimate the parameters required for the enhancement filters, namely the second-order statistics of the speech and noise. It is shown that the kurtosis and the diagonal slice of the FOC may be used to estimate such parameters as the SNR, the speech autocorrelation and the probability of speech presence in a given band. Subjective listening and examination of the spectrograms show that the resulting algorithm is effective on typical noises encountered in mobile telephony. Compared to the TIA-IS127 standard for noise reduction, it results in overall more noise reduction and better speech preservation in Gaussian, street and fan noise. Its effectiveness diminishes however in harmonic and impulsive types such as office and car engine, where discrimination between speech and noise based on FOC becomes more difficult.

[1]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  Andreas Spanias,et al.  Speech enhancement using the bispectrum , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Robert Hoeldrich,et al.  Non-Linear Spectral Subtraction with Combined Smoothing Strategies for Broadband Noise Reduction , 1997 .

[4]  Jerry M. Mendel,et al.  Tutorial on higher-order statistics (spectra) in signal processing and system theory: theoretical results and some applications , 1991, Proc. IEEE.

[5]  P. P. Vaidyanathan,et al.  Cosine-modulated FIR filter banks satisfying perfect reconstruction , 1992, IEEE Trans. Signal Process..

[6]  S. Leigh,et al.  Probability and Random Processes for Electrical Engineering , 1989 .

[7]  Jerry M. Mendel,et al.  Cumulant-based approach to harmonic retrieval and related problems , 1991, IEEE Trans. Signal Process..

[8]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9]  D. O'Shaughnessy Enhancing speech degrated by additive noise or interfering speakers , 1989, IEEE Communications Magazine.

[10]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[11]  Diego P. Ruiz,et al.  Parameter estimation of exponentially damped sinusoids using a higher order correlation-based approach , 1995, IEEE Trans. Signal Process..

[12]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[13]  B C Moore,et al.  Auditory filter shapes derived in simultaneous and forward masking. , 1981, The Journal of the Acoustical Society of America.

[14]  Kuldip K. Paliwal,et al.  Recognition of noisy speech using cumulant-based linear prediction analysis , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[15]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[16]  S. Mahmoud,et al.  The third-order cumulant of speech signals with application to reliable pitch estimation , 1998, Ninth IEEE Signal Processing Workshop on Statistical Signal and Array Processing (Cat. No.98TH8381).

[17]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[18]  Samy A. Mahmoud,et al.  Speech analysis and quality enhancement using higher order cumulants , 1999 .

[19]  C. L. Nikias,et al.  Signal processing with higher-order spectra , 1993, IEEE Signal Processing Magazine.

[20]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..