TRAP based features for LVCSR of meting data

This paper describes using temporal patterns (TRAPs) feature extraction in large vocabulary continuous speech recognition (LVCSR) of meeting data. Frequency differentiation and local operators are applied to critical-band speech spectrum. Tests are performed with HMM recognizer on ICSI meetings database. We show that TRAP features in with standard ones lead to improvement of word-error rate (WER).

[1]  Sangita R. Sharma,et al.  Multi-stream approach to robust speech recognition , 1999 .

[2]  Christoph Neukirchen,et al.  DUcoder-the Duisburg University LVCSR stackdecoder , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[4]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Shuangyu Chang,et al.  Learning discriminative temporal patterns in speech: development of novel TRAPS-like classifiers , 2003, INTERSPEECH.

[6]  Hynek Hermansky,et al.  Local averaging and differentiating of spectral plane for TRAP-based ASR , 2003, INTERSPEECH.

[7]  Brian Kingsbury,et al.  Distributed speech recognition using noise-robust MFCC and traps-estimated manner features , 2002, INTERSPEECH.

[8]  Steve Young,et al.  The HTK book , 1995 .

[9]  Hynek Hermansky,et al.  Beyond a single critical-band in TRAP based ASR , 2003, INTERSPEECH.

[10]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.