论文信息 - Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics

Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics

In this paper we describe two new objective automatic evaluation methods for machine translation. The first method is based on longest common subsequence between a candidate translation and a set of reference translations. Longest common subsequence takes into account sentence level structure similarity naturally and identifies longest co-occurring in-sequence n-grams automatically. The second method relaxes strict n-gram matching to skip-bigram matching. Skip-bigram is any pair of words in their sentence order. Skip-bigram cooccurrence statistics measure the overlap of skip-bigrams between a candidate translation and a set of reference translations. The empirical results show that both methods correlate with human judgments very well in both adequacy and fluency.

Chin-Yew Lin | Franz Josef Och | Chin-Yew Lin | F. Och

[1] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[2] H. Thompson. Thompson NEW DIRECTIONS : Automatic Evaluation of Translation Quality : Outline of Methodology and Report on Pilot Experiment , 1991 .

[3] Keh-Yih Su,et al. A New Quantitative Quality Measure for Machine Translation Systems , 1992, COLING.

[4] I. Dan Melamed,et al. Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons , 1995, VLC@ACL.

[5] M. F. Porter,et al. An algorithm for suffix stripping , 1997 .

[6] Hermann Ney,et al. An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[7] Eiichiro Sumita,et al. Using multiple edit distances to automatically rank machine translation output , 2001, MTSUMMIT.

[8] Patrick Pantel,et al. Discovering word senses from text , 2002, KDD.

[9] George R. Doddington,et al. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[10] Wai Lam,et al. Meta-evaluation of Summaries in a Cross-lingual Environment using Content-based Metrics , 2002, COLING.

[11] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12] Regina Barzilay,et al. Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[13] H. Ney,et al. A novel string-to-string distance measure with applications to machine translation evaluation , 2003, MTSUMMIT.

[14] Joseph P. Turian,et al. Evaluation of machine translation and its evaluation , 2003, MTSUMMIT.

[15] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[16] I. Dan Melamed,et al. Precision and Recall of Machine Translation , 2003, NAACL.

[17] Chin-Yew Lin,et al. ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[18] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.