A new group delay-based feature for robust speech recognition

In this paper we present a novel feature extraction algorithm based on group delay function for robust speech recognition. The modified group delay function (MODGDF) is the main feature extraction method based on group delay function, generally used for robust speech recognition. The recognition tests indicate this feature does not provide notably better results in the presence of additive noise in comparison with MFCC. In the presence of convolutional noise, the performance of MODGDF is considerably lower than MFCC. The method proposed in this paper is simple and makes more efficient utilization of the high resolution property of GDF. It is formed from three main parts which are signal modeling, GDF computation based on extracted model, and compression. The recognition results obtained over AURORA 2.0 task indicate its superior performance in comparison with MODGDF and MFCC.

[1]  Kuldip K. Paliwal,et al.  Importance of window shape for phase-only reconstruction of speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[3]  Hamid Sheikhzadeh,et al.  Phase-Only Speech Reconstruction Using Very Short Frames , 2011, INTERSPEECH.

[4]  B. Yegnanarayana,et al.  Significance of group delay functions in signal reconstruction from spectral magnitude or phase , 1984 .

[5]  Steve Young,et al.  The HTK book , 1995 .

[6]  Parham Aarabi,et al.  On the importance of phase in human speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Steven Kay,et al.  Modern Spectral Estimation: Theory and Application , 1988 .

[8]  Hermann von Helmholtz,et al.  On the Sensations of Tone , 1954 .

[9]  Kuldip K. Paliwal,et al.  Usefulness of phase spectrum in human speech perception , 2003, INTERSPEECH.

[10]  G. S. Ohm Ueber die Definition des Tones, nebst daran geknüpfter Theorie der Sirene und ähnlicher tonbildender Vorrichtungen , 1843 .

[11]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[12]  Bayya Yegnanarayana,et al.  Significance of group delay functions in spectrum estimation , 1992, IEEE Trans. Signal Process..

[13]  Hema A. Murthy,et al.  The modified group delay function and its application to phoneme recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Rajesh M. Hegde,et al.  Significance of the Modified Group Delay Feature in Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Jae S. Lim,et al.  Phase in speech and pictures , 1979, ICASSP.

[16]  Baris Bozkurt,et al.  On the use of phase information for speech recognition , 2005, 2005 13th European Signal Processing Conference.

[17]  Günther Palm,et al.  Effects of phase on the perception of intervocalic stop consonants , 1997, Speech Commun..