A novel approach using modulation features for multiphone-based speech recognition

Recent advances in coherent and convex demodulation have proven useful for analyzing and modifying the low-frequency envelope structure of speech. This paper reports the application of both methods, referred to here as bandwidth-constrained demodulation, to large-scale speech recognition in the form of new feature representations. Modulation-based features yielded measurable improvement when included as complementary sources of information with a baseline recognizer. Furthermore, both sets of demodulation features showed promise for outperforming the conventional Hilbert envelope method which underlies most modern speech recognition features. These experimental results show the potential for further development in feature representations based on recently-developed bandwidth-constrained modulation signal models.

[1]  Les E. Atlas,et al.  Time-Frequency Coherent Modulation Filtering of Nonstationary Signals , 2009, IEEE Transactions on Signal Processing.

[2]  Geoffrey Zweig,et al.  SCARF: a segmental conditional random field toolkit for speech recognition , 2010, INTERSPEECH.

[3]  Hynek Hermansky,et al.  Phoneme recognition using spectral envelope and modulation frequency features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Geoffrey Zweig,et al.  Advances in speech transcription at IBM under the DARPA EARS program , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Ronald Rosenfeld,et al.  A survey of smoothing techniques for ME models , 2000, IEEE Trans. Speech Audio Process..

[6]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[7]  Malcolm Slaney,et al.  Solving Demodulation as an Optimization Problem , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Geoffrey Zweig,et al.  Maximum mutual information multi-phone units in direct modeling , 2009, INTERSPEECH.

[9]  Geoffrey Zweig,et al.  A segmental CRF approach to large vocabulary continuous speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[10]  Richard M. Stern,et al.  Minimum variance modulation filter for robust speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Malcolm Slaney,et al.  The information content of demodulated speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..