Multi-pass ASR using vocabulary expansion

Current automatic speech recognition (ASR) systems have to limit their vocabulary size depending on available memory size, expected processing time, and available text data for building a vocabulary and a language model. Although the vocabularies of ASR systems are designed to achieve high coverage for the expected input data, it cannot be avoided that input data includes out-of-vocabulary (OOV) words. This is called the OOV problem. We propose dynamic vocabulary expansion using a conceptual base and multi-pass speech recognition using an expanded vocabulary. Relevant words to content of input speech are extracted based on a speech recognition result obtained using a reference vocabulary. An expanded vocabulary that includes fewer OOV words is built by adding the extracted words to the reference vocabulary. The second recognition process is performed using the new vocabulary. The experimental results for broadcast news speech show our method achieves a 30% reduction in OOV rate and improves speech recognition accuracy.

[1]  Tsuneaki Kato,et al.  Idea-deriving Information Retrieval System , 1999, NTCIR.

[2]  Alex Waibel,et al.  New developments in automatic meeting transcription , 2000, INTERSPEECH.

[3]  Yoshinori Sagisaka,et al.  A hierarchical language model incorporating class-dependent word models for OOV words recognition , 2000, INTERSPEECH.

[4]  Yoshihiko Hayashi,et al.  Automatic indexing of multimedia content by integration of audio, spoken language, and visual information , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[5]  Andreas Stolcke,et al.  The Meeting Project at ICSI , 2001, HLT.

[6]  Alexander H. Waibel,et al.  Reducing the OOV rate in broadcast news speech recognition , 1998, ICSLP.

[7]  Tanja Schultz,et al.  SMaRT: the Smart Meeting Room Task at ISL , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Daben Liu,et al.  Speech and language technologies for audio indexing and retrieval , 2000, Proceedings of the IEEE.

[9]  Petra Geutner,et al.  Adaptive vocabularies for transcribing multilingual broadcast news , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  James R. Glass,et al.  A multi-class approach for modelling out-of-vocabulary words , 2002, INTERSPEECH.