论文信息 - Word/sub-word lattices decomposition and combination for speech recognition

Word/sub-word lattices decomposition and combination for speech recognition

This paper presents the benefit of using multiple lexical units in the post-processing stage of an ASR system. Since the use of sub-word units can reduce the high out-of-vocabulary rate and improve the lack of text resources in statistical language modeling, we propose several methods to decompose, normalize and combine word and sub-word lattices generated from different ASR systems. By using a sub-word information table, every word in a lattice can be decomposed into sub-word units. These decomposed lattices can be combined into a common lattice in order to generate a confusion network. This lattices combination scheme results in an absolute syllable error rate reduction of about 1.4% over the sentence MAP baseline method for a Vietnamese ASR task. By comparing with the N-best lists combination and voting method, the proposed method works better.

Laurent Besacier | Brigitte Bigi | Sopheap Seng | Viet Bac Le

[1] A. Waibel,et al. A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[2] Ebru Arisoy,et al. Unsupervised segmentation of words into morphemes - morpho challenge 2005 application to automatic speech recognition , 2006, INTERSPEECH.

[3] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[4] Jean-François Bonastre,et al. Automatic transcription of Somali language , 2006, INTERSPEECH.

[5] Mitch Weintraub,et al. Explicit word error minimization in n-best list rescoring , 1997, EUROSPEECH.

[6] Hervé Blanchon,et al. The LIG Arabic/English speech translation system at IWSLT08 , 2007, IWSLT.

[7] Jean-François Serignat,et al. Spoken and Written Language Resources for Vietnamese , 2004, LREC.

[8] Mehryar Mohri,et al. Finite-State Transducers in Language and Speech Processing , 1997, CL.

[9] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[10] Ruhi Sarikaya,et al. On the use of morphological analysis for dialectal Arabic speech recognition , 2006, INTERSPEECH.

[11] Hakan Kardes,et al. UNSUPERVISED SEGMENTATION OF WORDS INTO MORPHEMES , 2009 .

[12] Richard M. Stern,et al. LATTICE COMBINATION FOR IMPROVED SPEECH RECOGNITON , 2001 .

[13] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.