论文信息 - Diagnostic Evaluation of Machine Translation Systems Using Automatically Constructed Linguistic Check-Points

Diagnostic Evaluation of Machine Translation Systems Using Automatically Constructed Linguistic Check-Points

We present a diagnostic evaluation platform which provides multi-factored evaluation based on automatically constructed check-points. A check-point is a linguistically motivated unit (e.g. an ambiguous word, a noun phrase, a verb~obj collocation, a prepositional phrase etc.), which are pre-defined in a linguistic taxonomy. We present a method that automatically extracts check-points from parallel sentences. By means of checkpoints, our method can monitor a MT system in translating important linguistic phenomena to provide diagnostic evaluation. The effectiveness of our approach for diagnostic evaluation is verified through experiments on various types of MT systems.

[1] Martin Chodorow,et al. An Unsupervised Method for Detecting Grammatical Errors , 2000, ANLP.

[2] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.

[3] Lluís Màrquez i Villodre,et al. Linguistic Features for Automatic Evaluation of Heterogenous MT Systems , 2007, WMT@ACL.

[4] George R. Doddington,et al. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[5] Ming Zhou,et al. Sentence Level Machine Translation Evaluation as a Ranking , 2007, WMT@ACL.

[6] NeyHermann,et al. A systematic comparison of various statistical alignment models , 2003 .

[7] Ming Zhou. A Block-Based Robust Dependency Parser for Unrestricted Chinese Text , 1999, ACL 2000.

[8] Ding Liu,et al. Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[9] Philipp Koehn,et al. Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[10] Philipp Koehn,et al. Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[11] Shiwen Yu. Automatic evaluation of output quality for Machine Translation systems , 1993, Machine Translation.

[12] Chin-Yew Lin,et al. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[13] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[14] Yu Shiwen. Automatic evaluation of output quality for Machine Translation systems , 1993 .

[15] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[16] Thorsten Joachims,et al. Making large-scale support vector machine learning practical , 1999 .

[17] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.