SPIDER: A continuous speech light decoder

In this paper, we propose a speech decoder, called SPeech lIght decoDER (SPIDER), for extracting the best decoding hypothesis from a search space constructed using weighted finite-state transducers. Despite existence of many speech decoders, these decoders are quite complicated as they take into consideration many design goals, such as extraction of N-best decoding hypotheses and generation of lattices. This makes it difficult to learn these decoders and test new ideas in speech recognition that often require decoder modification. Therefore, we propose in this paper a simple decoder supporting the primitive functions required for achieving real-time speech recognition with state-of-the-art recognition performance. This decoder can be viewed as a seed for further improvements and addition of new functionalities. Experimental results show that the performance of the proposed decoder is quite promising when compared with two other speech decoders, namely HDecode and Sphinx3.

[1]  Yasuo Horiuchi,et al.  Pipeline decomposition of speech decoders and their implementation based on delayed evaluation , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[2]  Jithendra Vepa,et al.  Juicer: A Weighted Finite-State Transducer Speech Decoder , 2006, MLMI.

[3]  Mehryar Mohri,et al.  On some applications of finite-state automata theory to natural language processing , 1996, Nat. Lang. Eng..

[4]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[5]  Hermann Ney,et al.  Improvements in beam search , 1994, ICSLP.

[6]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[7]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Mehryar Mohri,et al.  Network optimizations for large-vocabulary speech recognition , 1999, Speech Commun..

[9]  Shigeki Hagihara,et al.  Compact speech decoder based on pure functional programming , 2011 .

[10]  H.,et al.  Token Passing : a Simple Conceptual Model for ConnectedSpeech Recognition , 1989 .

[11]  Keikichi Hirose,et al.  Painless WFST Cascade Construction for LVCSR - Transducersaurus , 2011, INTERSPEECH.

[12]  Steve Young,et al.  The HTK book , 1995 .

[13]  H. Ney,et al.  Improvements in beam search for 10000-word continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Tatsuya Kawahara,et al.  Recent Development of Open-Source Speech Recognition Engine Julius , 2009 .

[15]  Keith Vertanen Baseline Wsj Acoustic Models for Htk and Sphinx : Training Recipes and Recognition Experiments , 2007 .

[16]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[17]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[18]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[19]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[20]  Xavier L. Aubert,et al.  An overview of decoding techniques for large vocabulary continuous speech recognition , 2002, Comput. Speech Lang..

[21]  Hermann Ney,et al.  Look-ahead techniques for fast beam search , 2000, Comput. Speech Lang..

[22]  Hermann Ney,et al.  Dynamic programming search for continuous speech recognition , 1999, IEEE Signal Process. Mag..

[23]  Mehryar Mohri Weighted Finite-State Transducer Algorithms. An Overview , 2004 .