Spectral peaks enhancement for extracting robust speech features

It is generally believed that the external noise added to speech signal corrupts speech spectrum and so speech features. This feature corruption degrades speech recognition systems performance. One solution to cope with the speech feature corruption is reducing the noise effects on the speech spectrum. In this paper, we propose to filter speech spectrum in order to enhance its spectral peaks in presence of noise. Then, we extract robust features from the spectrum with enhanced peaks. In addition, we apply the proposed filtering to another form of speech spectral representation known as modified group delay function (GDF). Phoneme and word recognition results show that MFCC features extracted from the spectrum with enhanced peaks are more robust to noise than MFCC derived from main noisy spectrum. In addition, MFCC features extracted from filtered GDF are more robust to noise than other MFCC features, especially in low SNR values.

[1]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[2]  Hervé Bourlard,et al.  Phase autocorrelation (PAC) derived robust speech features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Kuldip K. Paliwal,et al.  Product of power spectrum and group delay function for speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Gholamreza Farahani,et al.  ROBUST FEATURES FOR NOISY SPEECH RECOGNITION BASED ON FILTERING AND SPECTRAL PEAKS IN AUTOCORRELATION DOMAIN , 2005 .

[5]  Aruna Bayya,et al.  Robust features for speech recognition systems , 1998, ICSLP.

[6]  Biing-Hwang Juang,et al.  A family of distortion measures based upon projection operation for robust speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[7]  Bayya Yegnanarayana,et al.  Significance of group delay functions in spectrum estimation , 1992, IEEE Trans. Signal Process..

[8]  Hema A. Murthy,et al.  The modified group delay function and its application to phoneme recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Ahmad Akbari,et al.  Sub-band weighted projection measure for robust sub-band speech recognition , 2005, INTERSPEECH.

[12]  Satoshi Nakamura,et al.  Cepstrum derived from differentiated power spectrum for robust speech recognition , 2003, Speech Commun..