Simultaneous Detection and Estimation Approach for Speech Enhancement

In this paper, we present a simultaneous detection and estimation approach for speech enhancement. A detector for speech presence in the short-time Fourier transform domain is combined with an estimator, which jointly minimizes a cost function that takes into account both detection and estimation errors. Cost parameters control the tradeoff between speech distortion, caused by missed detection of speech components and residual musical noise resulting from false-detection. Furthermore, a modified decision-directed a priori signal-to-noise ratio (SNR) estimation is proposed for transient-noise environments. Experimental results demonstrate the advantage of using the proposed simultaneous detection and estimation approach with the proposed a priori SNR estimator, which facilitate suppression of transient noise with a controlled level of speech distortion.

[1]  Ilyas Potamitis Estimation of speech presence probability in the field of microphone array , 2004, IEEE Signal Processing Letters.

[2]  L. Milne‐Thomson A Treatise on the Theory of Bessel Functions , 1945, Nature.

[3]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[4]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[5]  Israel Cohen,et al.  Speech spectral modeling and enhancement based on autoregressive conditional heteroscedasticity models , 2006, Signal Process..

[6]  Y. Ephraim,et al.  Extension of the signal subspace speech enhancement approach to colored noise , 2003, IEEE Signal Processing Letters.

[7]  Eap Emanuël Habets,et al.  MMSE Log-Spectral Amplitude Estimator for Multiple Interferences , 2006 .

[8]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[9]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[10]  Amin G. Jaffer,et al.  Coupled detection- estimation of Gaussian processes in Gaussian noise , 1972, IEEE Trans. Inf. Theory.

[11]  Sven Nordholm,et al.  Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[13]  I. Cohen,et al.  Multichannel signal detection based on the transient beam-to-reference ratio , 2003, IEEE Signal Processing Letters.

[14]  Alex Acero,et al.  Automatic Removal of Typed Keystrokes From Speech Signals , 2007, IEEE Signal Processing Letters.

[15]  Wonyong Sung,et al.  A voice activity detector employing soft decision based noise spectrum adaptation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[16]  Jae S. Lim,et al.  A new application of adaptive noise cancellation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[17]  A. Kondoz,et al.  Analysis and improvement of a statistical model-based voice activity detector , 2001, IEEE Signal Processing Letters.

[18]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[19]  David Middleton,et al.  Simultaneous optimum detection and estimation of signals in noise , 1968, IEEE Trans. Inf. Theory.

[20]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[21]  Gerhard Schmidt,et al.  Topics in Acoustic Echo and Noise Control , 2006 .

[22]  Benoît Champagne,et al.  A perceptual signal subspace approach for speech enhancement in colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[24]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[25]  Régine Le Bouquin-Jeannès,et al.  Nonlinear acoustic echo cancellation based on Volterra filters , 2003, IEEE Trans. Speech Audio Process..

[26]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[27]  Yi Hu,et al.  A generalized subspace approach for enhancing speech corrupted by colored noise , 2003, IEEE Trans. Speech Audio Process..

[28]  Israel Cohen,et al.  Enhancement of Speech Signals Under Multiple Hypotheses using an Indicator for Transient Noise Presence , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[29]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[30]  David Malah,et al.  Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[31]  Wei Zhang,et al.  A soft voice activity detector based on a Laplacian-Gaussian model , 2003, IEEE Trans. Speech Audio Process..

[32]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[33]  David Middleton,et al.  Simultaneous signal detection and estimation under multiple hypotheses , 1972, IEEE Trans. Inf. Theory.

[34]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .