A novel path extension framework using steady segment detection for Mandarin speech recognition

Frame based decoders are short of using long span of time knowledge while segment based decoders often confuse with complex calculating. This paper proposes a novel decoding framework by integrating steady speech segments information into path extension procedure. Firstly, as baseline decoding system, a dynamic lexicon-tree copy recognizer is developed, which aims to accelerate popular frame based recognizer, HTK. Steady segments, where the spectrum is stable, are extracted using landmark detection, and then detection results are provided to the following decoding module. At decoding stage, traditional inter-HMM token spreading framework is modified using steady segment knowledge, based on the observation that coexistence of steady frame and inter-HMM extension is impossible. Experiments conducted on Mandarin broadcasting speech show that the character error rate and run time achieve 22.1% and 5.24% relative reduction respectively.

[1]  Georg Heigold,et al.  Recent improvements of the RWTH GALE Mandarin LVCSR system , 2008, INTERSPEECH.

[2]  Bo Xu,et al.  Update progress of Sinohear: advanced Mandarin LVCSR system at NLPR , 2000, INTERSPEECH.

[3]  Hermann Ney,et al.  The RWTH large vocabulary continuous speech recognition system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Wen Wang,et al.  Building a highly accurate Mandarin speech recognizer , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[5]  Guo-Hong Ding,et al.  One-Pass Coarse-to-Fine Segmental Speech Decoding Algorithm , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Long Nguyen,et al.  Progress in the BBN 2007 Mandarin Speech to Text system , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Wei Wu,et al.  Development of the 2008 SRI Mandarin speech-to-text system for broadcast news and conversation , 2009, INTERSPEECH.

[8]  Xavier L. Aubert,et al.  An overview of decoding techniques for large vocabulary continuous speech recognition , 2002, Comput. Speech Lang..

[9]  Hsiao-Wuen Hon,et al.  Unified frame and segment based models for automatic speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Dongxin Xu,et al.  The BBN Mandarin broadcast news transcription system , 2005, INTERSPEECH.

[12]  Hermann Ney,et al.  Look-ahead techniques for fast beam search , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.