A Human Judgement Corpus and a Metric for Arabic MT Evaluation
暂无分享,去创建一个
Kemal Oflazer | Houda Bouamor | Behrang Mohit | Hanan Alshikhabobakr | Houda Bouamor | Kemal Oflazer | B. Mohit | Hanan Alshikhabobakr
[1] Nizar Habash,et al. Automatic Error Analysis for Morphologically Rich Languages , 2011 .
[2] Hermann Ney,et al. Syntax-Oriented Evaluation Measures for Machine Translation Output , 2009, WMT@EACL.
[3] Hwee Tou Ng,et al. TESLA at WMT 2011: Translation Evaluation and Tunable Metric , 2011, WMT@EMNLP.
[4] Roland Kuhn,et al. AMBER: A Modified BLEU, Enhanced Ranking Metric , 2011, WMT@EMNLP.
[5] Hwee Tou Ng,et al. TESLA: Translation Evaluation of Sentences with Linear-Programming-Based Analysis , 2010, WMT@ACL.
[6] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.
[7] Kemal Oflazer,et al. BLEU+: a Tool for Fine-Grained BLEU Computation , 2008, LREC.
[8] Ding Liu,et al. Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.
[9] Jacob Cohen,et al. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .
[10] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[11] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .
[12] Nizar Habash,et al. Orthographic and morphological processing for English–Arabic statistical machine translation , 2011, Machine Translation.
[13] Alon Lavie,et al. Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.
[14] Philipp Koehn,et al. Findings of the 2011 Workshop on Statistical Machine Translation , 2011, WMT@EMNLP.
[15] Khalid Choukri,et al. Cooperation for Arabic Language Resources and Tools - The MEDAR Project , 2010, LREC.
[16] Nizar Habash,et al. MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .
[17] Christian Federmann,et al. Appraise: an Open-Source Toolkit for Manual Evaluation of MT Output , 2012, Prague Bull. Math. Linguistics.
[18] Philipp Koehn,et al. Results of the WMT15 Metrics Shared Task , 2015, WMT@EMNLP.
[19] Maja Popovic. Morphemes and POS tags for n-gram based evaluation metrics , 2011, WMT@EMNLP.
[20] Pavel Pecina,et al. A Simple Automatic MT Evaluation Metric , 2009, WMT@EACL.
[21] M. Kendall. A NEW MEASURE OF RANK CORRELATION , 1938 .
[22] Yves Lepage,et al. BLEU in Characters: Towards Automatic MT Evaluation in Languages without Word Delimiters , 2004, IJCNLP.
[23] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[24] Nitin Madnani,et al. TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate , 2009, Machine Translation.
[25] Philipp Koehn,et al. Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.
[26] Eric Brill,et al. A Unified Framework For Automatic Evaluation Using 4-Gram Co-occurrence Statistics , 2004, ACL.
[27] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.