Augmenting Translation Models with Simulated Acoustic Confusions for Improved Spoken Language Translation

We propose a novel technique for adapting text-based statistical machine translation to deal with input from automatic speech recognition in spoken language translation tasks. We simulate likely misrecognition errors using only a source language pronunciation dictionary and language model (i.e., without an acoustic model), and use these to augment the phrase table of a standard MT system. The augmented system can thus recover from recognition errors during decoding using synthesized phrases. Using the outputs of five different English ASR systems as input, we find consistent and significant improvements in translation quality. Our proposed technique can also be used in conjunction with lattices as ASR output, leading to further improvements.

[1]  Smaranda Muresan,et al.  Generalizing Word Lattice Translation , 2008, ACL.

[2]  A. Waibel,et al.  A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[3]  Noah A. Smith,et al.  Translating into Morphologically Rich Languages with Synthetic Phrases , 2013, EMNLP.

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  Robert C. Moore Fast and accurate sentence alignment of bilingual corpora , 2002, AMTA.

[6]  S. Kasl,et al.  THE RELATIONSHIP OF DISTURBANCES AND HESITATIONS IN SPONTANEOUS SPEECH TO ANXIETY. , 1965, Journal of personality and social psychology.

[7]  Yulia Tsvetkov,et al.  Generating English Determiners in Phrase-Based Translation with Synthetic Translation Options , 2013, WMT@ACL.

[8]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[9]  Masao Utiyama,et al.  Paraphrase Lattice for Statistical Machine Translation , 2010, ACL.

[10]  Brian Roark,et al.  Hallucinated n-best lists for discriminative language modeling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Dilek Z. Hakkani-Tür,et al.  Speech segmentation and spoken document processing , 2008, IEEE Signal Processing Magazine.

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Li Deng,et al.  Speech-Centric Information Processing: An Optimization-Oriented Approach , 2013, Proceedings of the IEEE.

[14]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[15]  Arianna Bisazza,et al.  FBK@IWSLT 2011 , 2011, IWSLT.

[16]  Pascale Fung,et al.  Phrase-level transduction model with reordering for spoken to written language transformation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[18]  J. E. Tree The Effects of False Starts and Repetitions on the Processing of Subsequent Words in Spontaneous Speech , 1995 .

[19]  Taro Watanabe,et al.  A Unified Approach in Speech-to-Speech Translation: Integrating Features of Speech recognition and Machine Translation , 2004, COLING.

[20]  Alon Lavie,et al.  The CMU Machine Translation Systems at WMT 2014 , 2014, WMT@ACL.

[21]  Hermann Ney,et al.  Spoken language translation using automatically transcribed text in training , 2012, IWSLT.

[22]  Sebastian Stüker,et al.  Overview of the IWSLT 2011 evaluation campaign , 2011, IWSLT.

[23]  Richard Zens,et al.  Speech Translation by Confusion Network Decoding , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[24]  Hermann Ney,et al.  Speech translation: coupling of recognition and translation , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[25]  Alon Lavie,et al.  The CMU Machine Translation Systems at WMT 2013: Syntax, Synthetic Translation Options, and Pseudo-References , 2013, WMT@ACL.

[26]  Florian Metze,et al.  Identification and modeling of word fragments in spontaneous speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Hermann Ney,et al.  Integrating Speech Recognition and Machine Translation: Where do We Stand? , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[28]  Bowen Zhou,et al.  Statistical Machine Translation for Speech: A Perspective on Structures, Learning, and Decoding , 2013, Proceedings of the IEEE.

[29]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[30]  Sebastian Stüker,et al.  Overview of the IWSLT 2012 evaluation campaign , 2012, IWSLT.

[31]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[32]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[33]  Eric Fosler-Lussier,et al.  A comparison of audio-free speech recognition error prediction methods , 2009, INTERSPEECH.

[34]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[35]  Daniel Jurafsky,et al.  Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates , 2010, Speech Commun..

[36]  Eric Fosler-Lussier,et al.  Discriminative language modeling using simulated ASR errors , 2010, INTERSPEECH.

[37]  Ondrej Bojar,et al.  Data Issues in English-to-Hindi Machine Translation , 2010, LREC.

[38]  F. Casacuberta,et al.  Recent efforts in spoken language translation , 2008, IEEE Signal Processing Magazine.

[39]  Mauro Cettolo,et al.  Integrated n-best re-ranking for spoken language translation , 2005, INTERSPEECH.

[40]  Bowen Zhou,et al.  On Efficient Coupling of ASR and SMT for Speech Translation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[41]  Philipp Koehn,et al.  Improved Statistical Machine Translation Using Paraphrases , 2006, NAACL.