The RWTH 2007 TC-STAR evaluation system for european English and Spanish

In this work, the RWTH automatic speech recognition systems developed for the third TC-STAR evaluation campaign 2007 are presented. The RWTH systems make systematic use of internal system combination, combining systems with differences in feature extraction, adaptation methods, and training data used. To take advantage of this, novel feature extraction methods were employed; this year saw the introduction of Gammatone features and MLP based phone posterior features. Further improvements were achieved using unsupervised training, and it is notable that these improvements were achieved using a fairly low amount of automatically transcribed data. Also contributing to the improvements over last year was the switch to MPE training, and the introduction of projecting SAT transforms.

[1]  Hermann Ney,et al.  Gammatone Features and Feature Combination for Large Vocabulary Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Fabio Brugnara,et al.  Adaptive training using simple target models [speech recognition applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Hynek Hermansky,et al.  Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.

[4]  Georg Heigold,et al.  The 2006 RWTH parliamentary speeches transcription system , 2006, INTERSPEECH.

[5]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[6]  Hermann Ney,et al.  Frame based system combination and a comparison with weighted ROVER and CNC , 2006, INTERSPEECH.

[7]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Ponani S. Gopalakrishnan,et al.  Clustering via the Bayesian information criterion with applications in speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[11]  Hermann Ney,et al.  Explicit word error minimization using word hypothesis posterior probabilities , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  Hermann Ney,et al.  Robust speech recognition using a voiced-unvoiced feature , 2002, INTERSPEECH.

[13]  Hermann Ney,et al.  Multigram-based grapheme-to-phoneme conversion for LVCSR , 2003, INTERSPEECH.

[14]  Hermann Ney,et al.  Efficient estimation of speaker-specific projecting feature transforms , 2007, INTERSPEECH.