Speech enhancement based on auditory masking properties and log-spectral distance

This paper studies on auditory masking properties and redefines log-spectral distance. Combined with the minimum mean-square error log-spectral amplitude estimator, a speech enhancement algorithm using the auditory masking properties and the log-spectral distance is proposed. This algorithm capitalizes the log-spectral distance to detect voice activity, and updates the noise estimator in real time according to the presence or absence of speech. In speech section, the residual noise is suppressed by auditory masking properties. In non-speech section, the ratio between the log-spectral distance and the distance threshold is used to suppress the residual noise. The experimental results show that this algorithm can effectively suppress the residual noise, and protect the weak voice.

[1]  A. Erell,et al.  Estimation using log-spectral-distance criterion for noise-robust speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  B. Atal,et al.  Optimizing digital speech coders by exploiting masking properties of the human ear , 1978 .

[3]  I. Cohen Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator , 2002, IEEE Signal Processing Letters.

[4]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[5]  Timo Gerkmann,et al.  Speech presence probability estimation based on temporal cepstrum smoothing , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Stanley A. Gelfand,et al.  Hearing: An Introduction to Psychological and Physiological Acoustics, Fourth Edition , 1998 .

[7]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[8]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[9]  Ch. V. Rama Rao,et al.  A Perceptual Approach to Reduce Musical Noise Using Critical Bands Tonality Coefficients and Masking Thresholds , 2009, Int. J. Commun. Netw. Syst. Sci..

[10]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[11]  A.R.D. Thornton,et al.  Foundations of Modern Auditory Theory , 1970 .

[12]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[13]  Ray Meddis,et al.  The psychophysics of absolute threshold and signal duration: a probabilistic approach. , 2011, The Journal of the Acoustical Society of America.

[14]  D. van Compernolle Spectral estimation using a log-distance error criterion applied to speech recognition , 1989, ICASSP.

[15]  Masao Kasuga,et al.  Frequency-dependent changes in absolute hearing threshold caused by perception of a previous sound. , 2007, The Journal of the Acoustical Society of America.