Sequential system combination for machine translation of speech

System combination is a technique which has been shown to yield significant gains in speech recognition and machine translation. Most combination schemes perform an alignment between different system outputs in order to produce lattices (or confusion networks), from which a composite hypothesis is chosen, possibly with the help of a large language model. The benefit of this approach is two-fold: (i) whenever many systems agree with each other on a set of words, the combination output contains these words with high confidence; and (ii) whenever the systems disagree, the language model resolves the ambiguity based on the (probably correct) agreed upon context. The case of machine translation system combination is more challenging because of the different word orders of the translations: the alignment has to incorporate computationally expensive movements of word blocks. In this paper, we show how one can combine translation outputs efficiently, extending the incremental alignment procedure of (A-V.I. Rosti et al., 2008). A comparison between different system combination design choices is performed on an Arabic speech translation task.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[3]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[4]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[5]  Giuseppe Riccardi,et al.  Computing consensus translation from multiple machine translation systems , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Dana Shapira,et al.  Edit distance with move operations , 2002, J. Discrete Algorithms.

[8]  H. Ney,et al.  A novel string-to-string distance measure with applications to machine translation evaluation , 2003, MTSUMMIT.

[9]  Geoffrey Zweig,et al.  Anatomy of an extremely fast LVCSR decoder , 2005, INTERSPEECH.

[10]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[11]  Mark J. F. Gales,et al.  The Cu-Htk Mandarin Broadcast News Transcription System , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[13]  Hermann Ney,et al.  Computing Consensus Translation for Multiple Machine Translation Systems Using Enhanced Hypothesis Alignment , 2006, EACL.

[14]  Richard M. Schwartz,et al.  Improved Word-Level System Combination for Machine Translation , 2007, ACL.

[15]  Geoffrey Zweig,et al.  The IBM 2006 Gale Arabic ASR System , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  Richard M. Schwartz,et al.  Incremental Hypothesis Alignment for Building Confusion Networks with Application to Machine Translation System Combination , 2008, WMT@ACL.

[17]  Jianfeng Gao,et al.  Indirect-HMM-based Hypothesis Alignment for Combining Outputs from Machine Translation Systems , 2008, EMNLP.

[18]  Sanjeev Khudanpur,et al.  Machine Translation System Combination using ITG-based Alignments , 2008, ACL.