Word/sub-word lattices decomposition and combination for speech recognition

This paper presents the benefit of using multiple lexical units in the post-processing stage of an ASR system. Since the use of sub-word units can reduce the high out-of-vocabulary rate and improve the lack of text resources in statistical language modeling, we propose several methods to decompose, normalize and combine word and sub-word lattices generated from different ASR systems. By using a sub-word information table, every word in a lattice can be decomposed into sub-word units. These decomposed lattices can be combined into a common lattice in order to generate a confusion network. This lattices combination scheme results in an absolute syllable error rate reduction of about 1.4% over the sentence MAP baseline method for a Vietnamese ASR task. By comparing with the N-best lists combination and voting method, the proposed method works better.

[1]  A. Waibel,et al.  A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[2]  Ebru Arisoy,et al.  Unsupervised segmentation of words into morphemes - morpho challenge 2005 application to automatic speech recognition , 2006, INTERSPEECH.

[3]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[4]  Jean-François Bonastre,et al.  Automatic transcription of Somali language , 2006, INTERSPEECH.

[5]  Mitch Weintraub,et al.  Explicit word error minimization in n-best list rescoring , 1997, EUROSPEECH.

[6]  Hervé Blanchon,et al.  The LIG Arabic/English speech translation system at IWSLT08 , 2007, IWSLT.

[7]  Jean-François Serignat,et al.  Spoken and Written Language Resources for Vietnamese , 2004, LREC.

[8]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[9]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[10]  Ruhi Sarikaya,et al.  On the use of morphological analysis for dialectal Arabic speech recognition , 2006, INTERSPEECH.

[11]  Hakan Kardes,et al.  UNSUPERVISED SEGMENTATION OF WORDS INTO MORPHEMES , 2009 .

[12]  Richard M. Stern,et al.  LATTICE COMBINATION FOR IMPROVED SPEECH RECOGNITON , 2001 .

[13]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.