Metric Score Landscape Challenge (MSLC23): Understanding Metrics’ Performance on a Wider Landscape of Translation Quality
暂无分享,去创建一个
[1] Luísa Coheur,et al. xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection , 2023, ArXiv.
[2] Shujian Huang,et al. BLEURT Has Universal Translations: An Analysis of Automatic Metrics by Minimum Risk Training , 2023, ACL.
[3] Mark Steedman,et al. Extrinsic Evaluation of Machine Translation Metrics , 2022, ACL.
[4] Liane Guillou,et al. ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics , 2022, WMT.
[5] Shannon L. Spruit,et al. No Language Left Behind: Scaling Human-Centered Machine Translation , 2022, ArXiv.
[6] Marcin Junczys-Dowmunt,et al. To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation , 2021, WMT.
[7] Eleftherios Avramidis,et al. Fine-grained linguistic evaluation for state-of-the-art Machine Translation , 2020, WMT.
[8] Nitika Mathur,et al. Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics , 2020, ACL.
[9] Thibault Sellam,et al. BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.
[10] Matt Post,et al. Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing , 2020, EMNLP.
[11] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[12] Chi-kiu Lo,et al. YiSi - a Unified Semantic MT Quality Evaluation and Estimation Metric for Languages with Different Levels of Available Resources , 2019, WMT.
[13] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[14] Hans Uszkoreit,et al. TQ-AutoTest – An Automated Test Suite for (Machine) Translation Quality , 2018, LREC.
[15] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[16] Ondrej Bojar,et al. Results of the WMT17 Metrics Shared Task , 2017, WMT.
[17] Philipp Koehn,et al. Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.
[18] Timothy Baldwin,et al. Is all that Glitters in Machine Translation Quality Estimation really Gold? , 2016, COLING.
[19] Rico Sennrich,et al. Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.
[20] Maja Popovic,et al. chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.
[21] Timothy Baldwin,et al. Testing for Significance of Increased Correlation with Human Judgment , 2014, EMNLP.
[22] Timothy Baldwin,et al. Is Machine Translation Getting Better over Time? , 2014, EACL.
[23] Timothy Baldwin,et al. Continuous Measurement Scales in Human Evaluation of Machine Translation , 2013, LAW@ACL.
[24] Gregory A. Sanders,et al. The NIST 2008 Metrics for machine translation challenge—overview, methodology, metrics, and results , 2009, Machine Translation.
[25] Philipp Koehn,et al. (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.
[26] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[27] Chi-kiu Lo,et al. Beyond Correlation: Making Sense of the Score Differences of New MT Evaluation Metrics , 2023, MTSUMMIT.
[28] Sören Dreano,et al. Embed_Llama: Using LLM Embeddings for the Metrics Shared Task , 2023, WMT.
[29] Tom Kocmi,et al. Cometoid: Distilling Strong Reference-based Machine Translation Metrics into Even Stronger Quality Estimation Metrics , 2023, WMT.
[30] Daniel Deutsch,et al. Ties Matter: Modifying Kendall's Tau for Modern Metric Meta-Evaluation , 2023, ArXiv.
[31] Aditya Siddhant,et al. MetricX-23: The Google Submission to the WMT 2023 Metrics Shared Task , 2023, WMT.
[32] Shimin Tao,et al. Empowering a Metric with LLM-assisted Named Entity Annotation: HW-TSC’s Submission to the WMT23 Metrics Shared Task , 2023, WMT.
[33] Manish Shrivastava,et al. MEE4 and XLsim : IIIT HYD’s Submissions’ for WMT23 Metrics Shared Task , 2023, WMT.
[34] Subhajit Naskar,et al. Quality Estimation Using Minimum Bayes Risk , 2023, WMT.
[35] George F. Foster,et al. Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust , 2022, WMT.
[36] Alessandro Sciré,et al. MaTESe: Machine Translation Evaluation as a Sequence Tagging Problem , 2022, WMT.
[37] Eleftherios Avramidis,et al. Linguistically Motivated Evaluation of Machine Translation Metrics Based on a Challenge Set , 2022, WMT.
[38] C. Federmann,et al. MS-COMET: More and Better Human Judgements Improve Metric Performance , 2022, WMT.
[39] Hao Yang,et al. Exploring Robustness of Machine Translation Metrics: A Study of Twenty-Two Automatic Metrics in the WMT22 Metric Task , 2022, WMT.
[40] André F. T. Martins,et al. Robust MT Evaluation with Sentence-level Multilingual Augmentation , 2022, WMT.
[41] A. Lavie,et al. COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task , 2022, WMT.
[42] A. Lavie,et al. Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain , 2021, WMT.
[43] Mattia Antonino Di Gangi,et al. FBK’s Neural Machine Translation Systems for IWSLT 2016 , 2016, IWSLT.
[44] A. Lommel. Multidimensional Quality Metrics : A Flexible System for Assessing Translation Quality , 2013 .