The RWTH aachen university open source speech recognition system

We announce the public availability of the RWTH Aachen University speech recognition toolkit. The toolkit includes state of the art speech recognition technology for acoustic model training and decoding. Speaker adaptation, speaker adaptive training, unsupervised training, a finite state automata library, and an efficient tree search decoder are notable components. Comprehensive documentation, example setups for training and recognition, and a tutorial are provided to support newcomers.

[1]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  P. Woodland,et al.  Flexible speaker adaptation using maximum likelihood linear regression , 1995 .

[3]  Steve Young,et al.  The HTK book , 1995 .

[4]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[5]  Herbert Gish,et al.  A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Hermann Ney,et al.  State tying for context dependent phoneme models , 1997, EUROSPEECH.

[7]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[8]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[9]  Hermann Ney,et al.  Improved methods for vocal tract normalization , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Hermann Ney,et al.  Progress in dynamic programming search for LVCSR , 2000 .

[11]  Hermann Ney,et al.  Using SIMD instructions for fast likelihood calculation in LVCSR , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[12]  Gunnar Evermann,et al.  Large vocabulary decoding and confidence estimation using word posterior probabilities , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13]  Hermann Ney,et al.  Look-ahead techniques for fast beam search , 2000, Comput. Speech Lang..

[14]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[15]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[16]  Hermann Ney,et al.  From within-word model search to across-word model search in large vocabulary continuous speech recognition , 2002, Comput. Speech Lang..

[17]  Hermann Ney,et al.  Extraction methods of voicing feature for robust speech recognition , 2003, INTERSPEECH.

[18]  Paul Lamere,et al.  Sphinx-4: a flexible open source framework for speech recognition , 2004 .

[19]  Hermann Ney,et al.  FSA: An Efficient and Flexible C++ Toolkit for Finite State Automata Using On-Demand Computation , 2004, ACL.

[20]  Hermann Ney,et al.  Advances in Arabic broadcast news transcription at RWTH , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[21]  Hermann Ney,et al.  Efficient estimation of speaker-specific projecting feature transforms , 2007, INTERSPEECH.

[22]  Georg Heigold,et al.  The RWTH 2007 TC-STAR evaluation system for european English and Spanish , 2007, INTERSPEECH.

[23]  Hermann Ney,et al.  Speech recognition techniques for a sign language recognition system , 2007, INTERSPEECH.

[24]  Hermann Ney,et al.  Towards automatic learning in LVCSR: rapid development of a Persian broadcast transcription system , 2008, INTERSPEECH.

[25]  Georg Heigold,et al.  Recent improvements of the RWTH GALE Mandarin LVCSR system , 2008, INTERSPEECH.

[26]  Michiel Bacchiani,et al.  Confidence scores for acoustic model adaptation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Hermann Ney,et al.  Writer Adaptive Training and Writing Variant Model Refinement for Offline Arabic Handwriting Recognition , 2009, 2009 10th International Conference on Document Analysis and Recognition.