Generalized maximum a posteriori spectral amplitude estimation for speech enhancement

GMAPA specifies the weight of prior density based on the SNR of the testing speech signals.GMAPA is capable of performing environment-aware speech enhancement.When the SNR is high, GMAPA adopts a small weight to prevent overcompensations.When the SNR is low, GMAPA uses a large weight to avoid disturbance of the restoration.Results show that GMAPA outperforms related approaches in objective and subjective evaluations. Spectral restoration methods for speech enhancement aim to remove noise components in noisy speech signals by using a gain function in the spectral domain. How to design the gain function is one of the most important parts for obtaining enhanced speech with good quality. In most studies, the gain function is designed by optimizing a criterion based on some assumptions of the noise and speech distributions, such as minimum mean square error (MMSE), maximum likelihood (ML), and maximum a posteriori (MAP) criteria. The MAP criterion shows advantage in obtaining a more reliable gain function by incorporating a suitable prior density. However, it has a problem as several studies showed: although MAP based estimator effectively reduces noise components when the signal-to-noise ratio (SNR) is low, it brings large speech distortion when the SNR is high. For solving this problem, we have proposed a generalized maximum a posteriori spectral amplitude (GMAPA) algorithm in designing a gain function for speech enhancement. The proposed GMAPA algorithm dynamically specifies the weight of prior density of speech spectra according to the SNR of the testing speech signals to calculate the optimal gain function. When the SNR is high, GMAPA adopts a small weight to prevent overcompensations that may result in speech distortions. On the other hand, when the SNR is low, GMAPA uses a large weight to avoid disturbance of the restoration caused by measurement noises. In our previous study, it has been proven that the weight of the prior density plays a crucial role to the GMAPA performance, and the weight is determined based on the SNR in an utterance-level. In this paper, we propose to compute the weight with the consideration of time-frequency correlations that result in a more accurate estimation of the gain function. Experiments were carried out to evaluate the proposed algorithm on both objective tests and subjective tests. The experimental results obtained from objective tests indicate that GMAPA is promising compared to several well-known algorithms at both high and low SNRs. The results of subjective listening tests indicate that GMAPA provides significantly higher sound quality than other speech enhancement algorithms.

[1]  Soo Ngee Koh,et al.  Improved noise suppression filter using self-adaptive estimator of probability of speech absence , 1999, Signal Process..

[2]  Simon Haykin,et al.  Advances in spectrum analysis and array processing , 1991 .

[3]  Rainer Martin,et al.  Spectral Subtraction Based on Minimum Statistics , 2001 .

[4]  I. Cohen Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator , 2002, IEEE Signal Processing Letters.

[5]  Marcello Federico,et al.  Bayesian estimation methods for n-gram language model adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Ju Liu,et al.  Speech Signal Enhancement Based on MAP Algorithm in the ICA Space , 2008, IEEE Transactions on Signal Processing.

[7]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[8]  Yu Tsao,et al.  Incorporating local information of the acoustic environments to MAP-based feature compensation and acoustic model adaptation , 2014, Comput. Speech Lang..

[9]  Kazuya Takeda,et al.  Single-Channel Multiple Regression for In-Car Speech Enhancement , 2006, IEICE Trans. Inf. Syst..

[10]  Eric Plourde,et al.  Auditory-Based Spectral Amplitude Estimators for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[12]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[13]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[14]  Tim Fingscheidt,et al.  A Data-Driven Approach to A Priori SNR Estimation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[16]  Yu Tsao,et al.  A MAP-based Online Estimation Approach to Ensemble Speaker and Speaking Environment Modeling , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Jacob Benesty,et al.  Fundamentals of Noise Reduction , 2008 .

[18]  Jia Liu,et al.  Voice conversion with smoothed GMM and MAP adaptation , 2003, INTERSPEECH.

[19]  Nam C. Phamdo,et al.  Signal/noise KLT based approach for enhancing speech degraded by colored noise , 2000, IEEE Trans. Speech Audio Process..

[20]  Rainer Martin,et al.  MAP Estimators for Speech Enhancement Under Normal and Rayleigh Inverse Gaussian Distributions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Jesper Jensen,et al.  Maximum likelihood based noise covariance matrix estimation for multi-microphone speech enhancement , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[23]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[24]  Theodore H. Venema,et al.  Compression for Clinicians , 1998 .

[25]  Jacob Benesty,et al.  New insights into the noise reduction Wiener filter , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  L. R. Bahl Language-model/acoustic channel balance mechanism , 1980 .

[27]  Yang Lu,et al.  A geometric approach to spectral subtraction , 2008, Speech Commun..

[28]  Yu Tsao,et al.  Speech enhancement using segmental nonnegative matrix factorization , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[30]  L G Potts,et al.  Differences and intersubject variability of loudness discomfort levels measured in sound pressure level and hearing level for TDH-50P and ER-3A earphones. , 1997, Journal of the American Academy of Audiology.

[31]  Satoshi Nakamura,et al.  Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments , 2011, Comput. Speech Lang..

[32]  C. Alippi,et al.  Simple approximation of sigmoidal functions: realistic design of digital neural networks capable of learning , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[33]  Min-Seok Choi,et al.  An improved estimation of a priori speech absence probability for speech enhancement: in perspective of speech perception , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[34]  Ying-Hui Lai,et al.  Development and Preliminary Verification of a Mandarin-Based Hearing-Aid Fitting Strategy , 2013, PloS one.

[35]  Woei-Chyn Chu,et al.  Measuring the long-term SNRs of static and adaptive compression amplification techniques for speech in noise. , 2013, Journal of the American Academy of Audiology.

[36]  Junfeng Li,et al.  Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication , 2011, Speech Commun..

[37]  F. Itakura,et al.  Balancing acoustic and linguistic probabilities , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[38]  Yu Tsao,et al.  A discriminative post-filter for speech enhancement in hearing aids , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  John H. L. Hansen,et al.  Speech Enhancement Based on Generalized Minimum Mean Square Error Estimators and Masking Properties of the Auditory System , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  Tran Huy Dat,et al.  Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech Enhancement , 2006, IEICE Trans. Inf. Syst..

[41]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[42]  Yu Tsao,et al.  Speech enhancement using generalized maximum a posteriori spectral amplitude estimator , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[43]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[44]  Yonghong Yan,et al.  Comparative intelligibility investigation of single-channel noise-reduction algorithms for Chinese, Japanese, and English. , 2011, The Journal of the Acoustical Society of America.

[45]  Rainer Martin,et al.  Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[46]  David Malah,et al.  Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[47]  Satoshi Nakamura,et al.  Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition , 2010, Speech Commun..

[48]  Junfeng Li,et al.  Adaptive beta-order generalized spectral subtraction for speech enhancement , 2008, Signal Process..

[49]  Naveen Parihar,et al.  Performance analysis of the Aurora large vocabulary baseline system , 2004, 2004 12th European Signal Processing Conference.

[50]  Jong-Mo Kum,et al.  Speech Enhancement Based on Minima Controlled Recursive Averaging Incorporating Second-Order Conditional MAP Criterion , 2009, IEEE Signal Processing Letters.

[51]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[52]  M. Zervakis Generalized maximum a posteriori processing of multichannel images and applications , 1996 .

[53]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[54]  Stamatis Vassiliadis,et al.  Sigmoid Generators for Neural Computing Using Piecewise Approximations , 1996, IEEE Trans. Computers.

[55]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[56]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[57]  Yu Tsao,et al.  A Study of Adaptive WDRC in Hearing Aids under Noisy Conditions , 2013 .

[58]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[59]  Fabio Cavallini,et al.  Fitting a Logistic Curve to Data , 1993 .

[60]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..