Speech Enhancement Based on Estimating Expected Values of Speech Cepstra

This paper proposes a novel speech enhancement (SE) algorithm based on estimating expected values of speech cepstra (EVSC), which will be herein referred as EVSC-SE. Unlike the conventional SE algorithms, where the a priori signalto-noise-ratio (SNR) is estimated from expected values of speech spectra (EVSS) directly, the proposed EVSC-SE algorithm estimates the a priori SNR from the EVSC. Under the Gaussian assumption of speech signals, we propose two approaches to estimate the EVSC. One is a novel cepstral subtraction approach, which is the estimation-based approach. The other is a modified cepstrum thresholding approach, which is the detection-based approach. Compared with conventional EVSS-based SE (EVSS-SE) algorithms, the proposed EVSC-SE algorithm is capable of tracking the a posteriori SNR at word onsets and offsets rapidly, achieving less speech distortion. Moreover, the EVSC-SE algorithm could suppress non-stationary noise effectively. Simulation results show that the EVSC-SE algorithm outperforms the conventional EVSS-SE algorithms in terms of segmental SNR and log-spectral distance.

[1]  G. de Krom A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. , 1993, Journal of speech and hearing research.

[2]  N. Sandgren,et al.  Smoothed nonparametric spectral estimation via cepsturm thresholding - Introduction of a method for smoothed nonparametric spectral estimation , 2006, IEEE Signal Processing Magazine.

[3]  Li Xiaodong Spectral subtraction based on the structure of noise power spectral density , 2010 .

[4]  Yi Hu,et al.  Speech enhancement based on wavelet thresholding the multitaper spectrum , 2004, IEEE Transactions on Speech and Audio Processing.

[5]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[6]  Mazin G. Rahim,et al.  On second order statistics and linear estimation of cepstral coefficients , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Rainer Martin,et al.  Improved A Posteriori Speech Presence Probability Estimation Based on a Likelihood Ratio With Fixed Priors , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Xiaodong Li,et al.  A Modified a Priori SNR Estimator Based on the United Speech Presence Probabilities: A Modified a Priori SNR Estimator Based on the United Speech Presence Probabilities , 2011 .

[9]  Petre Stoica,et al.  Total-Variance Reduction Via Thresholding: Application to Cepstral Analysis , 2007, IEEE Transactions on Signal Processing.

[10]  Satoshi Takahashi,et al.  Jacobian approach to fast acoustic model adaptation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  David Malah,et al.  Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  Chin-Teng Lin,et al.  Fundamental frequency estimation based on the joint time-frequency analysis of harmonic spectral structure , 2001, IEEE Trans. Speech Audio Process..

[13]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[14]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[15]  Pascal Scalart,et al.  Improved Signal-to-Noise Ratio Estimation for Speech Enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Israel Cohen,et al.  Relaxed statistical model for speech enhancement and a priori SNR estimation , 2005, IEEE Transactions on Speech and Audio Processing.

[17]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[18]  R.W. Schafer,et al.  From frequency to quefrency: a history of the cepstrum , 2004, IEEE Signal Processing Magazine.

[19]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[20]  Richard Heusdens,et al.  Tracking of Nonstationary Noise Based on Data-Driven Recursive Noise Power Estimation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Rainer Martin,et al.  Cepstral Smoothing of Spectral Filter Gains for Speech Enhancement Without Musical Noise , 2007, IEEE Signal Processing Letters.

[22]  Werner Verhelst,et al.  A new model for the short-time complex cepstrum of voiced speech , 1986, IEEE Trans. Acoust. Speech Signal Process..

[23]  Rainer Martin,et al.  A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  J. Damoulakis,et al.  Fast self-adapting broadband noise removal in the cepstral domain , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[25]  Rainer Martin,et al.  Bias compensation methods for minimum statistics noise power spectral density estimation , 2006, Signal Process..

[26]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[27]  Sandhya Hawaldar,et al.  Speech Enhancement for Nonstationary Noise Environments , 2011 .

[28]  A. Oppenheim,et al.  Homomorphic analysis of speech , 1968 .