Results of the WMT20 Metrics Shared Task
暂无分享,去创建一个
Markus Freitag | Ondřej Bojar | Nitika Mathur | Johnny Wei | Qingsong Ma | Ondrej Bojar | Markus Freitag | Qingsong Ma | Nitika Mathur | Johnny Wei | Johnny Tian-Zheng Wei
[1] Timothy Baldwin,et al. Randomized Significance Tests in Machine Translation , 2014, WMT@ACL.
[2] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[3] Markus Freitag,et al. BLEU Might Be Guilty but References Are Not Innocent , 2020, EMNLP.
[4] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.
[5] Philipp Koehn,et al. Findings of the 2020 Conference on Machine Translation (WMT20) , 2020, WMT.
[6] Mia Hubert,et al. Robust statistics for outlier detection , 2011, WIREs Data Mining Knowl. Discov..
[7] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[8] Ondrej Bojar,et al. Results of the WMT16 Metrics Shared Task , 2016 .
[9] Manish Shrivastava,et al. MEE : An Automatic Metric for Evaluation Using Embeddings for Machine Translation , 2020, 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA).
[10] Ondrej Bojar,et al. Results of the WMT14 Metrics Shared Task , 2013 .
[11] Maja Popovic,et al. chrF++: words helping character n-grams , 2017, WMT.
[12] Timothy Baldwin,et al. Testing for Significance of Increased Correlation with Human Judgment , 2014, EMNLP.
[13] Matt Post,et al. ParBLEU: Augmenting Metrics with Automatic Paraphrases for the WMT’20 Metrics Shared Task , 2020, WMT.
[14] Alon Lavie,et al. Unbabel’s Participation in the WMT20 Metrics Shared Task , 2020, WMT.
[15] Hermann Ney,et al. EED: Extended Edit Distance Measure for Machine Translation , 2019, WMT.
[16] Philipp Koehn,et al. Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.
[17] Chi-kiu Lo. Extended Study on Using Pretrained Language Models and YiSi-1 for Machine Translation Evaluation , 2020, WMT@EMNLP.
[18] Matt Post,et al. Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing , 2020, EMNLP.
[19] Alon Lavie,et al. COMET: A Neural Framework for MT Evaluation , 2020, EMNLP.
[20] Christophe Ley,et al. Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median , 2013 .
[21] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[22] S. Lewis,et al. Regression analysis , 2007, Practical Neurology.
[23] Teri A. Crosby,et al. How to Detect and Handle Outliers , 1993 .
[24] André F. T. Martins,et al. OpenKiwi: An Open Source Framework for Quality Estimation , 2019, ACL.
[25] Andy Way,et al. Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation , 2018, WMT.
[26] Antonio Toral,et al. A Set of Recommendations for Assessing Human-Machine Parity in Language Translation , 2020, J. Artif. Intell. Res..
[27] Markus Freitag,et al. Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task , 2020, WMT@EMNLP.
[28] Philipp Koehn,et al. Findings of the WMT 2020 Shared Task on Parallel Corpus Filtering and Alignment , 2020, WMT.
[29] Chi-kiu Lo,et al. YiSi - a Unified Semantic MT Quality Evaluation and Estimation Metric for Languages with Different Levels of Available Resources , 2019, WMT.
[30] Timothy Baldwin,et al. Putting Evaluation in Context: Contextual Embeddings Improve Machine Translation Evaluation , 2019, ACL.
[31] Daniel Marcu,et al. HyTER: Meaning-Equivalent Semantics for Translation Evaluation , 2012, NAACL.
[32] Zhen-Hua Ling,et al. Enhanced LSTM for Natural Language Inference , 2016, ACL.
[33] Maja Popovic,et al. chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.
[34] Hermann Ney,et al. CharacTer: Translation Edit Rate on Character Level , 2016, WMT.
[35] Ondrej Bojar,et al. Results of the WMT18 Metrics Shared Task: Both characters and embeddings achieve good performance , 2018, WMT.
[36] Ondrej Bojar,et al. Results of the WMT17 Metrics Shared Task , 2017, WMT.
[37] Timothy Baldwin,et al. Accurate Evaluation of Segment-level Machine Translation Metrics , 2015, NAACL.
[38] Junfeng Hu,et al. Incorporate Semantic Structures into Machine Translation Evaluation via UCCA , 2020, WMT@EMNLP.
[39] Thibault Sellam,et al. BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.
[40] Timothy Baldwin,et al. Continuous Measurement Scales in Human Evaluation of Machine Translation , 2013, LAW@ACL.
[41] Michal Novák,et al. SAO WMT19 Test Suite: Machine Translation of Audit Reports , 2019, WMT.
[42] Timothy Baldwin,et al. Improving Evaluation of Document-level Machine Translation Quality Estimation , 2017, EACL.
[43] Samuel Larkin,et al. Machine Translation Reference-less Evaluation using YiSi-2 with Bilingual Mappings of Massive Multilingual Language Model , 2020, WMT@EMNLP.
[44] Ondrej Bojar,et al. Results of the WMT19 Metrics Shared Task: Segment-Level and Strong MT Systems Pose Big Challenges , 2019, WMT.
[45] Ondrej Bojar,et al. Scratching the Surface of Possible Translations , 2013, TSD.
[46] Nitika Mathur,et al. Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics , 2020, ACL.