Speech Enhancement Based on Generalized Minimum Mean Square Error Estimators and Masking Properties of the Auditory System

In this paper, the family of conditional minimum mean square error (MMSE) spectral estimators is studied which take on the form (E(XEalphap/|Xp + Dp|))1alpha/, where Xp is the clean speech spectrum, and Dp is the noise spectrum, resulting in a generalized MMSE estimator (GMMSE). The degree of noise suppression versus musical tone artifacts of these estimators is studied. The tradeoffs in selection of (alpha), across noise spectral structure and signal-to-noise ratio (SNR) level, are also considered. Members of this family of estimators include the Ephraim-Malah (EM) amplitude estimator and, for high SNRs, the Wiener Filter. It is shown that the colorless residual noise observed in the EM estimator is a characteristic of this general family of estimators. An application of these estimators in an auditory enhancement scheme using the masking threshold of the human auditory system is formulated, resulting in the GMMSE-auditory masking threshold (AMT) enhancement method. Finally, a detailed evaluation of the proposed algorithms is performed over the phonetically balanced TIMIT database and the National Gallery of the Spoken Word (NGSW) audio archive using subjective and objective speech quality measures. Results show that the proposed GMMSE-AMT outperforms MMSE and log-MMSE enhancement methods using a detailed phoneme-based objective quality analysis

[1]  John H. L. Hansen,et al.  Markov model-based phoneme class partitioning for improved constrained iterative speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[2]  Yi Hu,et al.  A perceptually motivated approach for speech enhancement , 2003, IEEE Trans. Speech Audio Process..

[3]  John Mourjopoulos,et al.  Speech enhancement based on audible noise suppression , 1997, IEEE Trans. Speech Audio Process..

[4]  John H. L. Hansen,et al.  Perceptual based speech enhancement for normal-hearing and hearing-impaired individuals , 2003, INTERSPEECH.

[5]  John H. L. Hansen,et al.  Constrained iterative speech enhancement with application to speech recognition , 1991, IEEE Trans. Signal Process..

[6]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[7]  Enrico Gobbetti,et al.  Encyclopedia of Electrical and Electronics Engineering , 1999 .

[8]  Y. Ephraim Statistical model-based speech enhancement systems , 1988 .

[9]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[10]  B. Kollmeier,et al.  Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. , 1994, The Journal of the Acoustical Society of America.

[11]  John H. L. Hansen,et al.  Auditory masking threshold estimation for broadband noise sources with application to speech enhancement , 1999, EUROSPEECH.

[12]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[13]  G. Whipple Low residual noise speech enhancement utilizing time-frequency filtering , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  J.H.L. Hansen,et al.  Dual-channel iterative speech enhancement with constraints on an auditory-based spectrum , 1995, IEEE Trans. Speech Audio Process..

[15]  H. Buchholz The Confluent Hypergeometric Function , 2021, A Course of Modern Analysis.

[16]  John H. L. Hansen,et al.  Constrained iterative speech enhancement with application to automatic speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[17]  Yariv Ephraim,et al.  Speech enhancement based upon hidden Markov modeling , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[18]  Rolf Vetter,et al.  Single channel speech enhancement using MDL-based subspace approach in Bark domain , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[19]  Douglas D. O'Shaughnessy,et al.  Speech enhancement based conceptually on auditory evidence , 1991, IEEE Trans. Signal Process..

[20]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[21]  Jae S. Lim,et al.  Speech enhancement , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[23]  Alan McCree,et al.  New methods for adaptive noise suppression , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[24]  Jesper Jensen,et al.  A perceptual subspace approach for modeling of speech and audio signals with damped sinusoids , 2004, IEEE Transactions on Speech and Audio Processing.

[25]  J. C. Rutledge,et al.  Reducing correlated noise in digital hearing aids , 1996 .

[26]  Benoît Champagne,et al.  Incorporating the human hearing properties in the signal subspace approach for speech enhancement , 2003, IEEE Trans. Speech Audio Process..

[27]  John H. L. Hansen,et al.  Evaluation of an auditory masked threshold noise suppression algorithm in normal-hearing and hearing-impaired listeners , 2003, Speech Commun..

[28]  J H Hansen,et al.  Robust estimation of speech in noisy backgrounds based on aspects of the auditory process. , 1995, The Journal of the Acoustical Society of America.

[29]  Alan V. Oppenheim,et al.  All-pole modeling of degraded speech , 1978 .

[30]  Yariv Ephraim,et al.  Statistical-model-based speech enhancement systems , 1992, Proc. IEEE.

[31]  Jerry D. Gibson,et al.  Filtering of colored noise for speech enhancement and coding , 1991, IEEE Trans. Signal Process..

[32]  John H. L. Hansen,et al.  Dual-channel speech enhancement with auditory spectrum based constraints , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[34]  Ehud Weinstein,et al.  Maximum likelihood noise cancellation using the EM algorithm , 1989, IEEE Trans. Acoust. Speech Signal Process..

[35]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .