Instantaneous A Priori SNR Estimation by Cepstral Excitation Manipulation

As the <italic>a priori</italic> signal-to-noise ratio (SNR) contains crucial information about a signal's mixture of speech and noise, its estimation is subject to steady research. In this paper, we introduce a novel <italic> a priori</italic> SNR estimator based on synthesizing an idealized excitation signal in the cepstral domain. Our approach utilizes a source-filter decomposition in combination with a cepstral excitation manipulation in order to recreate an idealized excitation, which is subsequently shaped by an immanent envelope. In contrast to the well-known decision-directed approach by Ephraim and Malah, an <italic>instantaneous</italic> estimate is obtained, which is less prone to sudden acoustic environmental changes and musical noise. Additionally, the proposed estimator is able to preserve weak harmonic structures resulting in a spectrum that is more full-bodied. We present both a speaker-independent and a speaker-dependent variant of the new <italic>a priori</italic> SNR estimator, both showing more than 2 dB <inline-formula><tex-math notation="LaTeX">$\Delta \textrm {SNR}$</tex-math></inline-formula>  improvement versus state of the art, without any significant increase in speech distortion.

[1]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  Pascal Scalart,et al.  A two-step noise reduction technique , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[4]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Rainer Martin,et al.  Analysis of the Decision-Directed SNR Estimator for Speech Enhancement With Respect to Low-SNR and Transient Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Wouter Tirry,et al.  An iterative speech model-based a priori SNR estimator , 2015, INTERSPEECH.

[7]  Tim Fingscheidt,et al.  Environment-Optimized Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Yannis Stylianou,et al.  INTERSPEECH 2014 Special Session: Phase Importance in Speech Processing Applications , 2014 .

[9]  Peter Vary,et al.  Noise suppression by spectral magnitude estimation —mechanism and theoretical limits— , 1985 .

[10]  Israel Cohen,et al.  Speech enhancement using super-Gaussian speech models and noncausal a priori SNR estimation , 2005, Speech Commun..

[11]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[12]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[13]  Kuldip K. Paliwal,et al.  The importance of phase in speech enhancement , 2011, Speech Commun..

[14]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[15]  Rainer Martin,et al.  On the Statistics of Spectral Amplitudes After Variance Reduction by Temporal Cepstrum Smoothing and Cepstral Nulling , 2009, IEEE Transactions on Signal Processing.

[16]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[17]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[18]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[19]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[20]  Huajun Yu,et al.  Post-Filter Optimization for Multichannel Automotive Speech Enhancement , 2013 .

[21]  Mohamed Djendi,et al.  Reducing over- and under-estimation of the a priori SNR in speech enhancement techniques , 2014, Digit. Signal Process..

[22]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[23]  Rainer Martin,et al.  Improved A Posteriori Speech Presence Probability Estimation Based on a Likelihood Ratio With Fixed Priors , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Changchun Bao,et al.  Speech enhancement based on AR model parameters estimation , 2016, Speech Commun..

[25]  Jacob Benesty,et al.  New insights into the noise reduction Wiener filter , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[27]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[28]  Tim Fingscheidt,et al.  A Data-Driven Approach to A Priori SNR Estimation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Timo Gerkmann,et al.  Cepstral weighting for speech dereverberation without musical noise , 2011, 2011 19th European Signal Processing Conference.

[30]  Rainer Martin,et al.  A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  Pejman Mowlaee Begzade Mahale,et al.  Phase Estimation in Single Channel Speech Enhancement Using Phase Decomposition , 2015, IEEE Signal Processing Letters.

[32]  Pascal Scalart,et al.  Improved Signal-to-Noise Ratio Estimation for Speech Enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.