Critical Band Subspace-Based Speech Enhancement Using SNR and Auditory Masking Aware Technique

In this paper, a new subspace-based speech enhancement algorithm is presented. First, we construct a perceptual filterbank from psycho-acoustic model and incorporate it in the subspace-based enhancement approach. This filterbank is created through a five-level wavelet packet decomposition. The masking properties of the human auditory system are then derived based on the perceptual filterbank. Finally, the prior SNR and the masking threshold of each critical band are taken to decide the attenuation factor of the optimal linear estimator. Five different types of in-car noises in TAICAR database were used in our evaluation. The experimental results demonstrated that our approach outperformed conventional subspace and spectral subtraction methods.

[1]  Stphane Mallat,et al.  A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way , 2008 .

[2]  Yi Hu,et al.  A perceptually motivated approach for speech enhancement , 2003, IEEE Trans. Speech Audio Process..

[3]  George Carayannis,et al.  Speech enhancement from noise: A regenerative approach , 1991, Speech Commun..

[4]  Hanseok Ko,et al.  Spectral subtraction based on phonetic dependency and masking effects , 2000 .

[5]  David D. Falconer,et al.  A Comparison of Digital Speech Coding Methods for Mobile Radio Systems , 1987, IEEE J. Sel. Areas Commun..

[6]  Jhing-Fa Wang,et al.  Speech Enhancement Using Perceptual Wavelet Packet Decomposition and Teager Energy Operator , 2004, J. VLSI Signal Process..

[7]  Robert E. Yantorno,et al.  Performance of the modified Bark spectral distortion as an objective speech quality measure , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  R. Hellman Asymmetry of masking between noise and tone , 1972 .

[9]  B. Atal,et al.  Optimizing digital speech coders by exploiting masking properties of the human ear , 1978 .

[10]  Jhing-Fa Wang,et al.  Chip design of MFCC extraction for speech recognition , 2002, Integr..

[11]  Søren Holdt Jensen,et al.  Reduction of broad-band noise in speech by truncated QSVD , 1995, IEEE Trans. Speech Audio Process..

[12]  Andrew Sekey,et al.  An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..

[13]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[14]  Yi Hu,et al.  A generalized subspace approach for enhancing speech corrupted by colored noise , 2003, IEEE Trans. Speech Audio Process..

[15]  Chung-Hsien Yang,et al.  Robust Speaker Identification and Verification , 2007, IEEE Computational Intelligence Magazine.

[16]  Nam C. Phamdo,et al.  Signal/noise KLT based approach for enhancing speech degraded by colored noise , 2000, IEEE Trans. Speech Audio Process..

[17]  Saeed Gazor,et al.  An adaptive KLT approach for speech enhancement , 2001, IEEE Trans. Speech Audio Process..

[18]  Chung-Hsien Wu,et al.  TAICAR-The Collection and Annotation of an In-Car Speech Database Created in Taiwan , 2005, Int. J. Comput. Linguistics Chin. Lang. Process..

[19]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[20]  A.R.D. Thornton,et al.  Foundations of Modern Auditory Theory , 1970 .

[21]  Deepen Sinha,et al.  Low bit rate transparent audio compression using adapted wavelets , 1993, IEEE Trans. Signal Process..

[22]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.