论文信息 - DFKI System Combination with Sentence Ranking at ML4HMT-2011

DFKI System Combination with Sentence Ranking at ML4HMT-2011

We present a pilot study on a Hybrid Machine Translation system that takes advantag e of multilateral system-specific metadata provided as part of the shared task. The proposed solution offers a machine learning approach, resulting into a selection mechanism able to learn and rank system outputs on the sentence level, based on their quality. For training, due to the lack of human annotations, word-level Levenshtein distance has been used as a quality indicator, whereas a rich set of sentence features was extracted and selected from the dataset. Three classification algo

Eleftherios Avramidis | Eleftherios Avramidis

[1] Nizar Habash,et al. Generation-Heavy Hybrid Machine Translation , 2002, INLG.

[2] L. Ceriani,et al. The origins of the Gini index: extracts from Variabilità e Mutabilità (1912) by Corrado Gini , 2012 .

[3] R. A. Leibler,et al. On Information and Sufficiency , 1951 .

[4] Chris Callison-Burch,et al. A program for automatically selecting the best output from multiple machine translation engines , 2001, MTSUMMIT.

[5] Chris Quirk,et al. Training a Sentence-Level Machine Translation Confidence Measure , 2004, LREC.

[6] W. Cleveland. Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[7] Dan Klein,et al. Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[8] Giuseppe Riccardi,et al. Computing consensus translation from multiple machine translation systems , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[9] Blaz Zupan,et al. Orange: From Experimental Machine Learning to Interactive Data Mining , 2004, PKDD.

[10] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[11] Hermann Ney,et al. iROVER: Improving System Combination with Classification , 2007, NAACL.

[12] Christian Federmann,et al. Stochastic Parse Tree Selection for an Existing RBMT System , 2011, WMT@EMNLP.

[13] Nello Cristianini,et al. Estimating the Sentence-Level Quality of Machine Translation Systems , 2009, EAMT.

[14] Igor Kononenko,et al. Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[15] Mariona Taulé,et al. AnCora: Multilevel Annotated Corpora for Catalan and Spanish , 2008, LREC.

[16] Richard M. Schwartz,et al. Combining Outputs from Multiple Machine Translation Systems , 2007, NAACL.

[17] Philipp Koehn,et al. Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.