ReproGen: Proposal for a Shared Task on Reproducibility of Human Evaluations in NLG
暂无分享,去创建一个
Ehud Reiter | Shubham Agarwal | Anastasia Shimorina | Anya Belz | Ehud Reiter | Anastasia Shimorina | Anya Belz | Shubham Agarwal | A. Belz | A. Shimorina
[1] Anja Belz,et al. An Investigation into the Validity of Some Metrics for Automatically Evaluating Natural Language Generation Systems , 2009, CL.
[2] Matthew Shardlow,et al. CombiNMT: An Exploration into Neural Text Simplification Models , 2020, LREC.
[3] Ehud Reiter,et al. A Structured Review of the Validity of BLEU , 2018, CL.
[4] Piek T. J. M. Vossen,et al. A Shared Task of a New, Collaborative Type to Foster Reproducibility: A First Exercise in the Area of Language Science and Technology with REPROLANG2020 , 2020, LREC.
[5] Edward Raff,et al. A Step Toward Quantifying Independently Reproducible Machine Learning Research , 2019, NeurIPS.
[6] K. Bretonnel Cohen,et al. Community Perspective on Replicability in Natural Language Processing , 2019, RANLP.
[7] Anja Belz,et al. Comparing Rating Scales and Preference Judgements in Language Evaluation , 2010, INLG.
[8] Michael C. Frank,et al. Estimating the reproducibility of psychological science , 2015, Science.
[9] Albert Gatt,et al. Best practices for the human evaluation of automatically generated text , 2019, INLG.
[10] Dimitra Gkatzia,et al. Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions , 2020, INLG.
[11] Paul Piwek,et al. Agreement is overrated: A plea for correlation to assess human evaluation reliability , 2019, INLG.
[12] Joelle Pineau,et al. Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program) , 2020, J. Mach. Learn. Res..
[13] Verena Rieser,et al. Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.
[14] P. Hunter. The reproducibility “crisis” , 2017, EMBO reports.
[15] Simon Mille,et al. Disentangling the Properties of Human Evaluation Methods: A Classification System to Support Comparability, Meta-Evaluation and Reproducibility Testing , 2020, INLG.