Bayesian noise estimation in the modulation domain

Abstract Modulation domain has been reported to be a better alternative to time-frequency domain for speech enhancement, as speech intelligibility is closely linked with the modulation spectrum. Motivated by that, this paper investigates the use of modulation domain to model the noise density function. Results show that the modulation domain based Gamma density function better represents the noise density for all time-varying noise signals compared to the non-modulation domain. The modulation based Gamma density is then used to derive noise estimator via a Bayesian motivated MMSE approach. As the Gamma density closely matches the true noise spectrum in the modulation domain, the proposed noise estimator does not require bias compensation even for poor signal-to-noise ratio (SNR) conditions, i.e.,  ≤  5 dB. The proposed method yields better noise suppression compared to the state of the art methods and provides higher improvements.

[1]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[2]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Thomas Esch,et al.  Efficient musical noise suppression for speech enhancement system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Sven Nordholm,et al.  On the optimization of sigmoid function for speech enhancement , 2011, 2011 19th European Signal Processing Conference.

[5]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. , 1997, The Journal of the Acoustical Society of America.

[6]  Les E. Atlas,et al.  Frequency Reassignment for Coherent Modulation Filtering , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[9]  Rongshan Yu A low-complexity noise estimation algorithm based on smoothing of noise power estimation and estimation bias correction , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Kuldip K. Paliwal,et al.  Using STFT real and imaginary parts of modulation signals for MMSE-based speech enhancement , 2014, Speech Commun..

[11]  Sven Nordholm,et al.  Noise Estimation Based on Soft Decisions and Conditional Smoothing for Speech Enhancement , 2012, IWAENC.

[12]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[13]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[14]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[15]  Saeed V. Vaseghi,et al.  Advanced Digital Signal Processing and Noise Reduction , 2006 .

[16]  Hynek Hermansky,et al.  Speech enhancement based on temporal processing , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Kuldip K. Paliwal,et al.  Role of phase estimation in speech enhancement , 2006, INTERSPEECH.

[18]  K. Krishnamoorthy Handbook of statistical distributions with applications , 2006 .

[19]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[20]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[21]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[22]  Rainer Martin,et al.  Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[24]  Kuldip K. Paliwal,et al.  Single-channel speech enhancement using spectral subtraction in the short-time modulation domain , 2010, Speech Commun..

[25]  J.B. Allen,et al.  A unified approach to short-time Fourier analysis and synthesis , 1977, Proceedings of the IEEE.

[26]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[27]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[28]  John H. L. Hansen,et al.  An effective quality evaluation protocol for speech enhancement algorithms , 1998, ICSLP.

[29]  Misha Pavel,et al.  Intelligibility of speech with filtered time trajectories of spectral envelopes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[30]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[31]  Jont B. Allen,et al.  Short term spectral analysis, synthesis, and modification by discrete Fourier transform , 1977 .

[32]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[33]  Kuldip K. Paliwal,et al.  Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator , 2012, Speech Commun..

[34]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[35]  B. Kollmeier,et al.  Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. , 1994, The Journal of the Acoustical Society of America.

[36]  Sven Nordholm,et al.  Optimization and evaluation of sigmoid function with a priori SNR estimate for real-time speech enhancement , 2013, Speech Commun..

[37]  Les E. Atlas,et al.  A non-uniform modulation transform for audio coding with increased time resolution , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[38]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Philipos C. Loizou,et al.  Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum , 2005, IEEE Transactions on Speech and Audio Processing.

[40]  Kamil K. Wójcicki,et al.  Channel selection in the modulation domain for improved speech intelligibility in noise. , 2012, The Journal of the Acoustical Society of America.

[41]  Jesper Jensen,et al.  MMSE based noise PSD tracking with low complexity , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  Rainer Martin,et al.  SPEECH ENHANCEMENT IN THE DFT DOMAIN USING LAPLACIAN SPEECH PRIORS , 2003 .

[43]  Rainer Martin,et al.  Spectral Subtraction Based on Minimum Statistics , 2001 .

[44]  D. Joanes,et al.  Comparing measures of sample skewness and kurtosis , 1998 .

[45]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .