PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
暂无分享,去创建一个
[1] M. Kendall. A NEW MEASURE OF RANK CORRELATION , 1938 .
[2] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .
[3] Hermann Ney,et al. HMM-Based Word Alignment in Statistical Translation , 1996, COLING.
[4] George R. Doddington,et al. Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .
[5] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[6] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.
[7] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.
[8] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[9] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .
[10] Philipp Koehn,et al. Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.
[11] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.
[12] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.
[13] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[14] Hwee Tou Ng,et al. Decomposability of Translation Metrics for Improved Evaluation and Efficient Algorithms , 2008, EMNLP.
[15] Philipp Koehn,et al. Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.
[16] Hwee Tou Ng,et al. MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation , 2008, ACL.
[17] Alon Lavie,et al. The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.
[18] Daniel Jurafsky,et al. Robust Machine Translation Evaluation with Entailment Features , 2009, ACL.
[19] Nitin Madnani,et al. Fluency, Adequacy, or HTER? Exploring Different Human Judgments with a Tunable MT Metric , 2009, WMT@EACL.
[20] Yifan He,et al. The DCU Dependency-Based Metric in WMT-MetricsMATR 2010 , 2010, WMT@ACL.
[21] C. Spearman. The proof and measurement of association between two things. , 2015, International journal of epidemiology.
[22] Kevin Duh,et al. Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.
[23] Alon Lavie,et al. METEOR-NEXT and the METEOR Paraphrase Tables: Improved Evaluation Support for Five Target Languages , 2010, WMT@ACL.
[24] Hwee Tou Ng,et al. TESLA: Translation Evaluation of Sentences with Linear-Programming-Based Analysis , 2010, WMT@ACL.
[25] Philipp Koehn,et al. Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.
[26] Daniel Jurafsky,et al. The Best Lexical Metric for Phrase-Based Statistical MT System Optimization , 2010, NAACL.
[27] Roland Kuhn,et al. AMBER: A Modified BLEU, Enhanced Ranking Metric , 2011, WMT@EMNLP.
[28] Alexandra Birch,et al. Reordering Metrics for MT , 2011, ACL.
[29] Dekai Wu,et al. MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles , 2011, ACL.
[30] Philipp Koehn,et al. Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.
[31] Hwee Tou Ng,et al. Better Evaluation Metrics Lead to Better Machine Translation , 2011, EMNLP.
[32] Nitin Madnani,et al. E-rating Machine Translation , 2011, WMT@EMNLP.