Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms

While most speech enhancement algorithms improve speech quality, they may not improve speech intelligibility in noise. This paper focuses on the development of an algorithm that can be optimized for a specific acoustic environment and improve speech intelligibility. The proposed method decomposes the input signal into time-frequency (T-F) units and makes binary decisions, based on a Bayesian classifier, as to whether each T-F unit is dominated by the target signal or the noise masker. Target-dominated T-F units are retained while masker-dominated T-F units are discarded. The Bayesian classifier is trained for each acoustic environment using an incremental approach that continuously updates the model parameters as more data become available. Listening experiments were conducted to assess the intelligibility of speech synthesized using the incrementally adapted models as a function of the number of training sentences. Results indicated substantial improvements in intelligibility (over 60% in babble at -5 dB SNR) with as few as ten training sentences in babble and at least 80 sentences in other noisy conditions.

[1]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[2]  K. R. Rao,et al.  BIFORE or Hadamard transform , 1971 .

[3]  R. Lyman Ott.,et al.  An introduction to statistical methods and data analysis , 1977 .

[4]  Jae Lim,et al.  Evaluation of a correlation subtraction method for enhancing speech degraded by additive white noise , 1978 .

[5]  Steven F. Boll,et al.  Optimal estimators for spectral restoration of noisy speech , 1984, ICASSP.

[6]  C. Schreiner,et al.  Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms. , 1988, Journal of neurophysiology.

[7]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[8]  B. Kollmeier,et al.  Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. , 1994, The Journal of the Acoustical Society of America.

[9]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[10]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[11]  Chin-Hui Lee,et al.  On-line adaptation of the SCHMM parameters based on the segmental quasi-Bayes learning for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[12]  Chin-Hui Lee,et al.  On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate , 1997, IEEE Trans. Speech Audio Process..

[13]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[14]  C. Borror An Introduction to Statistical Methods and Data Analysis, 5th Ed. , 2002 .

[15]  Birger Kollmeier,et al.  SNR estimation based on amplitude modulation analysis with applications to noise suppression , 2003, IEEE Trans. Speech Audio Process..

[16]  Richard M. Stern,et al.  A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition , 2004, Speech Commun..

[17]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[18]  Fan-Gang Zeng,et al.  Speech recognition with amplitude and frequency modulations. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Philipos C. Loizou,et al.  Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum , 2005, IEEE Transactions on Speech and Audio Processing.

[20]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[21]  Jesper Jensen,et al.  A general optimization procedure for spectral speech enhancement methods , 2006, 2006 14th European Signal Processing Conference.

[22]  W. Bastiaan Kleijn,et al.  Codebook driven short-term predictor parameter estimation for speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[24]  Yi Hu,et al.  A comparative intelligibility study of single-microphone noise reduction algorithms. , 2007, The Journal of the Acoustical Society of America.

[25]  Jesper Jensen,et al.  A data-driven approach to optimizing spectral speech enhancement methods for various error criteria , 2007, Speech Commun..

[26]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[27]  Hugh J. McDermott,et al.  The Design and Evaluation of a Hearing Aid with Trainable Amplification Parameters , 2007, Ear and hearing.

[28]  Bin Chen,et al.  A Laplacian-based MMSE estimator for speech enhancement , 2007, Speech Commun..

[29]  Tim Fingscheidt,et al.  Environment-Optimized Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  DeLiang Wang,et al.  Segregation of unvoiced speech from nonspeech interference. , 2008, The Journal of the Acoustical Society of America.

[31]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006, IEEE Trans. Neural Networks.

[32]  Y. Hu,et al.  TECHNIQUES FOR ESTIMATING THE IDEAL BINARY MASK , 2008 .

[33]  Philipos C Loizou,et al.  Effect of spectral resolution on the intelligibility of ideal binary masked speech. , 2008, The Journal of the Acoustical Society of America.

[34]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[35]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[36]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .