Bootstrapping A Statistical Speech Translator From A Rule-Based One

We describe a series of experiments in which we start with English to French and English to Japanese versions of a rule-based speech translation system for a medical domain, and bootstrap corresponding statistical systems. Comparative evaluation reveals that the statistical systems are still slightly inferior to the rule-based ones, despite the fact that con- siderable effort has been invested in tuning both the recognition and translation components; however, a hybrid system is able to deliver a small but significant improvement in performance. In conclusion, we suggest that the hybrid architecture we describe potentially allows construction of limited-domain speech translation systems which combine substantial source-language coverage with high-precision translation.

[1]  Beth Ann Hockey,et al.  Training Statistical Language Models from Grammar-Generated Data: A Comparative Case-Study , 2008, GoTAL.

[2]  François Yvon,et al.  Minimum Error Rate Training Semiring , 2011, EAMT.

[3]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[4]  Hitoshi Isahara,et al.  A methodology for comparing grammar-based and robust approaches to speech understanding , 2005, INTERSPEECH.

[5]  Hitoshi Isahara,et al.  Developing Non-European Translation Pairs in a Medium-Vocabulary Medical Speech Translation System , 2008, LREC.

[6]  Hitoshi Isahara,et al.  Many-to-Many Multilingual Medical Speech Translation on a PDA , 2008, AMTA 2008.

[7]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[8]  Andreas Stolcke,et al.  Using a stochastic context-free grammar as a language model for speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Manny Rayner,et al.  A Bootstrapped Interlingua-Based SMT Architecture , 2010, EAMT.

[10]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[11]  Beth Ann Hockey,et al.  Almost Flat Functional Semantics for Speech Translation , 2008, COLING.

[12]  Philipp Koehn,et al.  Can we Relearn an RBMT System? , 2008, WMT@ACL.

[13]  Stephanie Seneff,et al.  Two-Stage Translation: A Combined Linguistic and Statistical Machine Translation Framework , 2008, AMTA.

[14]  Beth Ann Hockey,et al.  Using Artificially Generated Data to Evaluate Statistical Machine Translation , 2009, ACL 2009.

[15]  Beth Ann Hockey,et al.  Putting Linguistics into Speech Recognition: The Regulus Grammar Compiler (Studies in Computational Linguistics (Stanford, Calif.).) , 2006 .

[16]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[17]  Rebecca Jonson Generating Statistical Language Models from Interpretation Grammars in Dialogue Systems , 2006, EACL.