Auditory-Based Spectral Amplitude Estimators for Speech Enhancement

We propose a new family of Bayesian estimators for speech enhancement where the cost function includes both a power law and a weighting factor. The parameters of the cost function, and therefore of the corresponding estimator gain, are chosen based on characteristics of the human auditory system, namely, the compressive nonlinearities of the cochlea, the perceived loudness and the ear's masking properties. It is found that choosing the parameters in this way results in a decrease of the estimator gain at high frequencies. This frequency dependence of the gain improves the noise reduction while limiting the speech distortion. Experimental results show that the new estimators achieve better enhancement performance than existing Bayesian estimators such as those based on the minimum mean-square error (MMSE) of the short-time spectral amplitude (STSA), the MMSE of the logarithm of the STSA (LSA) or the weighted euclidien (WE) error, both in terms of objective and subjective measures.

[1]  Panos E. Papamichalis,et al.  Practical approaches to speech coding , 1987 .

[2]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[4]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[5]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[6]  Melvin Alexander Applied Statistics and Probability for Engineers , 1995 .

[7]  S. Schultz Principles of Neural Science, 4th ed. , 2001 .

[8]  R B Monsen,et al.  Long-term average speech spectra for normal and hearing-imparied adolescents. , 1982, The Journal of the Acoustical Society of America.

[9]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[10]  B. Champagne,et al.  Perceptually based speech enhancement using the weighted β-SA estimator , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[12]  Philipos C. Loizou,et al.  Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum , 2005, IEEE Transactions on Speech and Audio Processing.

[13]  Susanto Rahardja,et al.  /spl beta/-order MMSE spectral amplitude estimation for speech enhancement , 2005, IEEE Transactions on Speech and Audio Processing.

[14]  Thomas Baer,et al.  A model for the prediction of thresholds, loudness, and partial loudness , 1997 .

[15]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[16]  Simon J. Godsill,et al.  Towards a perceptually optimal spectral amplitude estimator for audio signal enhancement , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[17]  Wai-Kai Chen,et al.  The Electrical Engineering Handbook , 2004 .

[18]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[19]  E. Lopez-Poveda,et al.  A computational algorithm for computing nonlinear auditory frequency selectivity. , 2001, The Journal of the Acoustical Society of America.

[20]  Joseph Lipka,et al.  A Table of Integrals , 2010 .

[21]  D. D. Greenwood A cochlear frequency-position function for several species--29 years later. , 1990, The Journal of the Acoustical Society of America.

[22]  Eric Plourde,et al.  Integrating the cochlea's compressive nonlinearity in the Bayesian approach for speech enhancement , 2007, 2007 15th European Signal Processing Conference.

[23]  John H. L. Hansen,et al.  An effective quality evaluation protocol for speech enhancement algorithms , 1998, ICSLP.

[24]  Simon J. Godsill,et al.  A perceptually balanced loss function for short-time spectral amplitude estimation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25]  Douglas D. O'Shaughnessy,et al.  Speech communications - human and machine, 2nd Edition , 2000 .

[26]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[27]  Ning Ma,et al.  Speech enhancement using a masking threshold constrained Kalman filter and its heuristic implementations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Tracy Petersen,et al.  Acoustic noise suppression in the context of a perceptual model , 1981, ICASSP.

[29]  L. Robles,et al.  Mechanics of the mammalian cochlea. , 2001, Physiological reviews.

[30]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[31]  RECOMMENDATION ITU-R BS.1534-1 - Method for the subjective assessment of intermediate quality level of coding systems , 2003 .

[32]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[33]  Yi Hu,et al.  Evaluation of objective measures for speech enhancement , 2006, INTERSPEECH.

[34]  Sven Nordholm,et al.  Spectral subtraction using reduced delay convolution and adaptive averaging , 2001, IEEE Trans. Speech Audio Process..

[35]  Robert V. Brill,et al.  Applied Statistics and Probability for Engineers , 2004, Technometrics.

[36]  B. Champagne,et al.  Further Analysis of the β-Order MMSE STSA Estimator for Speech Enhancement , 2007, 2007 Canadian Conference on Electrical and Computer Engineering.

[37]  W. Bastiaan Kleijn,et al.  On causal algorithms for speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  P. Strevens Iii , 1985 .

[39]  Jont B. Allen,et al.  Short term spectral analysis, synthesis, and modification by discrete Fourier transform , 1977 .

[40]  B. Moore An introduction to the psychology of hearing, 3rd ed. , 1989 .

[41]  Yi Hu,et al.  Subjective Comparison of Speech Enhancement Algorithms , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[42]  Rainer Martin,et al.  Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[43]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.