论文信息 - SNR-dependent non-uniform spectral compression for noisy speech recognition

SNR-dependent non-uniform spectral compression for noisy speech recognition

It is known that the perceived loudness of a tone signal by a human is spectrally masked by background noise. This masking effect causes not only a shift of just-audible sound pressure level of the tone, but also produces a masked loudness function having steeper slope than the unmasked one. This masking property of perceived loudness stimulates us to propose a new mel-scale-based feature extraction method with non-uniform spectral compression for speech recognition in noisy environments. In this method, the speech power spectrum is to undergo mel-scaled band-pass filtering, as in the standard MFCC front-end. However, the energies of the outputs of the filters are compressed by different root values defined by a compression function. This compression function is a function of the SNR in each filter band. Using this new scheme of SNR-dependent non-uniform spectral compression (SNSC) for mel-scaled filter-bank-based cepstral coefficients, substantial improvement is found for recognition in different noisy environments, as compared to the standard MFCC and features derived with cubic root spectral compression.

Shu Hung Leung | Kam-keung Chu | S. Leung | K. Chu

[1] Stephen T. Neely,et al. Signals, Sound, and Sensation , 1997 .

[2] Shu Hung Leung,et al. Perceptually non-uniform spectral compression for noisy speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3] S. S. Stevens. On the psychophysical law. , 1957, Psychological review.

[4] Yan Guo-cai,et al. On Psychological Law , 2002 .

[5] Jérôme Boudy,et al. Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[6] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[7] William M. Hartmann,et al. Psychoacoustics: Facts and Models , 2001 .

[8] Mark J. F. Gales,et al. Cepstral parameter compensation for HMM recognition in noise , 1993, Speech Commun..

[9] Patrice Alexandre,et al. Root cepstral analysis: A unified view. Application to speech processing in car noise environments , 1993, Speech Commun..

[10] Shu Hung Leung,et al. Feature extraction based on perceptually non-uniform spectral compression for speech recognition , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..