Psychoacoustically-motivated adaptive β-order generalized spectral subtraction based on data-driven optimization

To mitigate the performance limitations caused by the constant spectral order β in the traditional spectral subtraction methods, we previously presented an adaptive β-order generalized spectral subtraction (GSS) in which the spectral order β is updated in a heuristic way [10]. In this paper, we propose a psychoacoustically-motivated adaptive β-order GSS, by considering that different frequency bands contribute different amounts to speech intelligibility (i.e., the bandimportance function). Specifically, in this proposed adaptive β-order GSS, the tendency of spectral order β to change with the input local signal-to-noise ratio (SNR) is quantitatively approximated by a sigmoid function, which is derived through a data-driven optimization procedure by minimizing the intelligibility-weighted distance between the desired speech spectrum and its estimate. The inherent parameters of the sigmoid function are further optimized with the data-driven optimization procedure. Experimental results indicate that the proposed psychoacoustically-motivated adaptive β-order GSS yields great improvements over the traditional spectral subtraction methods with the intelligibility-weighted measures.

[1]  Steven F. Boll A spectral subtraction algorithm for suppression of acoustic noise in speech , 1979, ICASSP.

[2]  Joseph Sylvester Chang,et al.  A parametric formulation of the generalized spectral subtraction method , 1998, IEEE Trans. Speech Audio Process..

[3]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[4]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[5]  Junfeng Li,et al.  Noise reduction based on adaptive β-order generalized spectral subtraction for speech enhancement , 2007, INTERSPEECH.

[6]  E. Zwicker,et al.  Analytical expressions for critical‐band rate and critical bandwidth as a function of frequency , 1980 .

[7]  C V Pavlovic,et al.  Derivation of primary parameters and procedures for use in speech intelligibility predictions. , 1987, The Journal of the Acoustical Society of America.

[8]  P. Peterson,et al.  Intelligibility-weighted measures of speech-to-interference ratio and speech system performance. , 1993, The Journal of the Acoustical Society of America.

[9]  Philipos C. Loizou,et al.  A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Volker Schless,et al.  SNR-dependent flooring and noise overestimation for joint application of spectral subtraction and model combination , 1998, ICSLP.

[11]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[12]  P. Loizou Introduction to cochlear implants. , 1999, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[13]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..