Bayesian short-time spectral amplitude estimators for single-channel speech enhancement

Single-channel speech enhancement algorithms are used to remove background noise in speech. They are present in many common devices such as cell phones and hearing aids. In the Bayesian short-time spectral amplitude (STSA) approach for speech enhancement, an estimate of the clean speech STSA is derived by minimizing the statistical expectation of a chosen cost function. Examples of such estimators are the minimum mean square error (MMSE) STSA, the β-order MMSE STSA (β-SA), which includes a power law parameter, and the weighted Euclidian (WE), which includes a weighting parameter. This thesis analyzes single-channel Bayesian STSA estimators for speech enhancement with the aim of, firstly, gaining a better understanding of their properties and, secondly, proposing new cost functions and statistical models to improve their performance. In addition to a novel analysis of the β-SA estimator for parameter β ≤ 0, three new families of estimators are developed in this thesis: the Weighted β-SA (Wβ-SA), the Generalized Weighted family of STSA estimators (GWSA) and a family of multi-dimensional Bayesian STSA estimators. The Wβ-SA combines the power law of the β-SA and the weighting factor of the WE. Its parameters are chosen based on the characteristics of the human auditory system which is found to have the advantage of improving the noise reduction at high frequencies while limiting the speech distortions at low frequencies. An analytical generalization of a cost function structure found in many existing Bayesian STSA estimators is proposed through the GWSA family of estimators. This allows a unification of Bayesian STSA estimators and, moreover, provides a better understanding of this general class of estimators. Finally, we propose a multi-dimensional family of estimators that accounts for the correlated frequency components in a digitized speech signal. In fact, the spectral components of the clean speech are traditionally assumed uncorrelated in Bayesian STSA estimators, however, this assumption is inexact since some correlation is present in practice. Objective and subjective experiments are performed in different noise environments and at several signal-to-noise ratios (SNR). Results show the superiority of the proposed estimators over benchmark estimators.

[1]  Kuldip K. Paliwal,et al.  Role of phase estimation in speech enhancement , 2006, INTERSPEECH.

[2]  John Mourjopoulos,et al.  Speech enhancement based on audible noise suppression , 1997, IEEE Trans. Speech Audio Process..

[3]  Pascal Scalart,et al.  Improved Signal-to-Noise Ratio Estimation for Speech Enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Susanto Rahardja,et al.  An MMSE speech enhancement approach incorporating masking properties , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Bin Chen,et al.  A Laplacian-based MMSE estimator for speech enhancement , 2007, Speech Commun..

[6]  Susanto Rahardja,et al.  An invertible frequency eigendomain transformation for masking-based subspace speech enhancement , 2005, IEEE Signal Processing Letters.

[7]  Allen Gersho,et al.  Adaptive postfiltering for quality enhancement of coded speech , 1995, IEEE Trans. Speech Audio Process..

[8]  R. B. Monsen,et al.  Long‐term average speech spectra for normal and hearing‐impaired adolescents , 1979 .

[9]  Eric Plourde,et al.  A family of Bayesian STSA estimators for the enhancement of speech with correlated frequency components , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Sharon Gannot,et al.  Speech enhancement using a mixture-maximum model , 1999, IEEE Trans. Speech Audio Process..

[11]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[12]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[13]  Yi Hu,et al.  A generalized subspace approach for enhancing speech corrupted by colored noise , 2003, IEEE Trans. Speech Audio Process..

[14]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[15]  A. Kondoz,et al.  Analysis and improvement of a statistical model-based voice activity detector , 2001, IEEE Signal Processing Letters.

[16]  Jesper Jensen,et al.  A data-driven approach to optimizing spectral speech enhancement methods for various error criteria , 2007, Speech Commun..

[17]  Eric Plourde,et al.  Bayesian spectral amplitude estimation for speech enhancement with correlated spectral components , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[18]  Min Li,et al.  Perceptual time-frequency subtraction algorithm for noise reduction in hearing aids , 2001, IEEE Transactions on Biomedical Engineering.

[19]  Volodya Grancharov,et al.  Human Perception in Speech Processing , 2006 .

[20]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[21]  Tracy Petersen,et al.  Acoustic noise suppression in the context of a perceptual model , 1981, ICASSP.

[22]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[23]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[24]  Yi Hu,et al.  A Comparative Intelligibility Study of Speech Enhancement Algorithms , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[25]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[26]  Takeshi Yamada,et al.  Performance Estimation of Speech Recognition System Under Noise Conditions Using Objective Quality Measures and Artificial Voice , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  W. Rudin Real and complex analysis , 1968 .

[28]  Richard Heusdens,et al.  Tracking of Nonstationary Noise Based on Data-Driven Recursive Noise Power Estimation , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Philipos C. Loizou,et al.  A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Douglas D. O'Shaughnessy,et al.  Speech communications - human and machine, 2nd Edition , 2000 .

[31]  Susanto Rahardja,et al.  /spl beta/-order MMSE spectral amplitude estimation for speech enhancement , 2005, IEEE Transactions on Speech and Audio Processing.

[32]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Douglas C. Montgomery,et al.  Applied Statistics and Probability for Engineers, Third edition , 1994 .

[34]  B. Champagne,et al.  Further Analysis of the β-Order MMSE STSA Estimator for Speech Enhancement , 2007, 2007 Canadian Conference on Electrical and Computer Engineering.

[35]  C. K. Yuen,et al.  Digital spectral analysis , 1979 .

[36]  Juan Manuel Górriz,et al.  Jointly Gaussian PDF-Based Likelihood Ratio Test for Voice Activity Detection , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  Bruno O. Shubert,et al.  Random variables and stochastic processes , 1979 .

[38]  Ray Meddis,et al.  Cochlear nonlinearity between 500 and 8000 Hz in listeners with normal hearing. , 2003, The Journal of the Acoustical Society of America.

[39]  Ehud Weinstein,et al.  Iterative and sequential Kalman filter-based speech enhancement algorithms , 1998, IEEE Trans. Speech Audio Process..

[40]  Ing Yann Soon,et al.  A spectral filtering method based on hybrid wiener filters for speech enhancement , 2009, Speech Commun..

[41]  Thomas Baer,et al.  A model for the prediction of thresholds, loudness, and partial loudness , 1997 .

[42]  A. Gray,et al.  Distance measures for speech processing , 1976 .

[43]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[44]  Andrew J Oxenham,et al.  Comparing different estimates of cochlear compression in listeners with normal and impaired hearing. , 2005, The Journal of the Acoustical Society of America.

[45]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[46]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[47]  George Carayannis,et al.  Speech enhancement from noise: A regenerative approach , 1991, Speech Commun..

[48]  Peter Kabal Windows for Transform Processing , 2005 .

[49]  Y. Ephraim,et al.  A Brief Survey of Speech Enhancement 1 , 2018, Microelectronics.

[50]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[51]  Eric Plourde,et al.  Integrating the cochlea's compressive nonlinearity in the Bayesian approach for speech enhancement , 2007, 2007 15th European Signal Processing Conference.

[52]  M. K. Hasan,et al.  A modified a priori SNR for speech enhancement using spectral subtraction rules , 2004, IEEE Signal Processing Letters.

[53]  B. Champagne,et al.  Perceptually based speech enhancement using the weighted β-SA estimator , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[54]  Martin Vetterli,et al.  Rate-Constrained Collaborative Noise Reduction for Wireless Hearing Aids , 2009, IEEE Transactions on Signal Processing.

[55]  B C Moore,et al.  Inter-relationship between different psychoacoustic measures assumed to be related to the cochlear active mechanism. , 1999, The Journal of the Acoustical Society of America.

[56]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[57]  Sven Nordholm,et al.  Spectral subtraction using reduced delay convolution and adaptive averaging , 2001, IEEE Trans. Speech Audio Process..

[58]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[59]  Jacob Benesty,et al.  New insights into the noise reduction Wiener filter , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[60]  Satoshi Nakamura,et al.  Speech enhancement based on the subspace method , 2000, IEEE Trans. Speech Audio Process..

[61]  Jian-Ming Jin,et al.  Computation of special functions , 1996 .

[62]  Peter Kabal Measuring Speech Activity , 2000 .

[63]  John H. L. Hansen,et al.  An effective quality evaluation protocol for speech enhancement algorithms , 1998, ICSLP.

[64]  Rainer Martin,et al.  Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[65]  V. Kroupa,et al.  Digital spectral analysis , 1983, Proceedings of the IEEE.

[66]  Philipos C. Loizou,et al.  Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum , 2005, IEEE Transactions on Speech and Audio Processing.

[67]  Søren Vang Andersen,et al.  A Block-Based Linear MMSE Noise Reduction with a High Temporal Resolution Modeling of the Speech Excitation , 2005, EURASIP J. Adv. Signal Process..

[68]  P. Kabal,et al.  Preprocessing of noisy speech for voice coders , 2002, Speech Coding, 2002, IEEE Workshop Proceedings..

[69]  Jont B. Allen,et al.  Short term spectral analysis, synthesis, and modification by discrete Fourier transform , 1977 .

[70]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[71]  Benoît Champagne,et al.  Incorporating the human hearing properties in the signal subspace approach for speech enhancement , 2003, IEEE Trans. Speech Audio Process..

[72]  Eric Plourde,et al.  Generalized Bayesian Estimators of the Spectral Amplitude for Speech Enhancement , 2009, IEEE Signal Processing Letters.

[73]  Eric Plourde,et al.  Auditory-Based Spectral Amplitude Estimators for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[74]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[75]  Christophe Beaugeant,et al.  Combined noise and echo reduction in hands-free systems: a survey , 2001, IEEE Trans. Speech Audio Process..

[76]  Marc Moonen,et al.  Robustness analysis of multichannel Wiener filtering and generalized sidelobe cancellation for multimicrophone noise reduction in hearing aid applications , 2005, IEEE Transactions on Speech and Audio Processing.

[77]  John H. L. Hansen,et al.  Constrained iterative speech enhancement with application to speech recognition , 1991, IEEE Trans. Signal Process..

[78]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[79]  John H. L. Hansen,et al.  Speech Enhancement Based on Generalized Minimum Mean Square Error Estimators and Masking Properties of the Auditory System , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[80]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[81]  Richard Heusdens,et al.  An MMSE Estimator for Speech Enhancement Under a Combined Stochastic–Deterministic Speech Model , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[82]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[83]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[84]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[85]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[86]  Kuldip K. Paliwal,et al.  A speech enhancement method based on Kalman filtering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[87]  D. D. Greenwood A cochlear frequency-position function for several species--29 years later. , 1990, The Journal of the Acoustical Society of America.

[88]  Panos E. Papamichalis,et al.  Practical approaches to speech coding , 1987 .

[89]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[90]  Alexander Fischer,et al.  Quantile based noise estimation for spectral subtraction and Wiener filtering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[91]  I. Cohen Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator , 2002, IEEE Signal Processing Letters.

[92]  Ning Ma,et al.  Speech enhancement using a masking threshold constrained Kalman filter and its heuristic implementations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[93]  W. Bastiaan Kleijn,et al.  On causal algorithms for speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[94]  Jae Lim,et al.  Evaluation of a correlation subtraction method for enhancing speech degraded by additive white noise , 1978 .

[95]  Catherine G. O’Hanlon,et al.  Forward Masking Additivity and Auditory Compression at Low and High Frequencies , 2003, Journal of the Association for Research in Otolaryngology.

[96]  Dennis S. Bernstein,et al.  Matrix Mathematics: Theory, Facts, and Formulas with Application to Linear Systems Theory , 2005 .

[97]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[98]  Susanto Rahardja,et al.  Adaptive /spl beta/-order MMSE speech enhancement application for mobile communication in a car environment , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[99]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[100]  P. Brodal The Central Nervous System , 1992 .

[101]  Israel Cohen,et al.  Relaxed statistical model for speech enhancement and a priori SNR estimation , 2005, IEEE Transactions on Speech and Audio Processing.

[102]  Joseph Lipka,et al.  A Table of Integrals , 2010 .

[103]  Paul R. White,et al.  Speech spectral amplitude estimators using optimally shaped Gamma and Chi priors , 2009, Speech Commun..

[104]  H. Saunders,et al.  Probability, Random Variables and Stochastic Processes (2nd Edition) , 1989 .

[105]  Simon J. Godsill,et al.  Towards a perceptually optimal spectral amplitude estimator for audio signal enhancement , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[106]  Douglas C. Runger Applied Statistics and Probability for Engineers, Third edition , 2003 .

[107]  Simon J. Godsill,et al.  A perceptually balanced loss function for short-time spectral amplitude estimation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[108]  Robert R. Capranica,et al.  9- – THE AUDITORY SYSTEM , 1976 .

[109]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[110]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[111]  Birger Kollmeier,et al.  PEMO-Q—A New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[112]  Yi Hu,et al.  Evaluation of objective measures for speech enhancement , 2006, INTERSPEECH.

[113]  Israel Cohen,et al.  Speech enhancement using a noncausal a priori SNR estimator , 2004, IEEE Signal Processing Letters.

[114]  E. Lopez-Poveda,et al.  A computational algorithm for computing nonlinear auditory frequency selectivity. , 2001, The Journal of the Acoustical Society of America.

[115]  Marc Moonen,et al.  Binaural Noise Reduction Algorithms for Hearing Aids That Preserve Interaural Time Delay Cues , 2007, IEEE Transactions on Signal Processing.

[116]  Wei Zhang,et al.  Speech enhancement employing Laplacian-Gaussian mixture , 2005, IEEE Transactions on Speech and Audio Processing.

[117]  J. S. Bird,et al.  Speech enhancement for mobile telephony , 1990 .

[118]  L. Robles,et al.  Mechanics of the mammalian cochlea. , 2001, Physiological reviews.

[119]  R. Martin,et al.  New speech enhancement techniques for low bit rate speech coding , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[120]  Simon J. Godsill,et al.  Efficient Alternatives to the Ephraim and Malah Suppression Rule for Audio Signal Enhancement , 2003, EURASIP J. Adv. Signal Process..

[121]  Saito,et al.  Fundamentals of Speech Signal Processing , 1986 .

[122]  Neviano Dal Degan,et al.  Acoustic noise analysis and speech enhancement techniques for mobile radio applications , 1988 .

[123]  I. M. Pyshik,et al.  Table of integrals, series, and products , 1965 .

[124]  Hugo Fastl,et al.  Psychoacoustics Facts and Models. 2nd updated edition , 1999 .

[125]  Y. Ephraim,et al.  A Brief Survey of Speech Enhancement , 2003 .