Temporal-domain filtering approach for multiband speech enhancement

Conventional multiband speech enhancement involves splitting the noisy speech spectrum into various frequency bins and performing spectral domain speech enhancement in each band independently. When multibands are obtained by splitting the spectra, influence of spectral components in a band over the neighboring band components are appreciable, that reduces the effectiveness of clean speech estimation. To reduce this influence, in the current work clean speech estimation is performed by filtering the noisy speech in the temporal-domain into various ERB-based sub-bands followed by spectral domain speech enhancement in each band using DCT-based MMSE estimator. Further an approach is proposed, to calculate apriori speech presence/absence probability based on apriori SNR. The performance of speech enhancement algorithms are evaluated using objective measures such as, PESQ and composite speech quality measure.

[1]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[2]  Antony William Rix,et al.  Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality a , 2002 .

[3]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[4]  Kuldip K. Paliwal,et al.  On the usefulness of STFT phase spectrum in human listening tests , 2005, Speech Commun..

[5]  Soo Ngee Koh,et al.  Noisy speech enhancement using discrete cosine transform , 1998, Speech Commun..

[6]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[8]  Philipos C. Loizou,et al.  A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[10]  S. R. Mahadeva Prasanna,et al.  Temporal and Spectral Processing Methods for Processing of Degraded Speech: A Review , 2009 .

[11]  Ing Yann Soon,et al.  A DCT-Based Speech Enhancement System With Pitch Synchronous Analysis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.