The Titech large vocabulary WFST speech recognition system

In this paper we present evaluations on the large vocabulary speech decoder we are currently developing at Tokyo Institute of Technology. Our goal is to build a fast, scalable, flexible decoder to operate on weighted finite state transducer (WFST) search spaces. Even though the development of the decoder is still in its infancy we have already implemented a impressive feature set and are achieving good accuracy and speed on a large vocabulary spontaneous speech task. We have developed a technique to allow parts of the decoder to be run on the graphics processor, this can lead to a very significant speed up.

[1]  Geoffrey Zweig,et al.  Anatomy of an extremely fast LVCSR decoder , 2005, INTERSPEECH.

[2]  Shigeru Katagiri,et al.  Time and memory efficient viterbi decoding for LVCSR using a precompiled search network , 2001, INTERSPEECH.

[3]  Michael Riley,et al.  Towards automatic closed captioning : low latency real time broadcast news transcription , 2002, INTERSPEECH.

[4]  Geoffrey Zweig,et al.  An architecture for rapid decoding of large vocabulary conversational speech , 2003, INTERSPEECH.

[5]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[6]  Atsushi Nakamura,et al.  Generalized fast on-the-fly composition algorithm for WFST-based speech recognition , 2005, INTERSPEECH.

[7]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[8]  Isabel Trancoso,et al.  Using dynamic WFST composition for recognizing broadcast news , 2002, INTERSPEECH.

[9]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[10]  Paul Lamere,et al.  Design of the CMU Sphinx-4 Decoder , 2022 .

[11]  Isabel Trancoso,et al.  A specialized on-the-fly algorithm for lexicon and language model composition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Fernando Pereira,et al.  Efficient general lattice generation and rescoring , 1999, EUROSPEECH.

[13]  K. Maekawa CORPUS OF SPONTANEOUS JAPANESE : ITS DESIGN AND EVALUATION , 2003 .

[14]  Anne Rogers,et al.  Parallel Speech Recognition , 2004, International Journal of Parallel Programming.

[15]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.