Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
暂无分享,去创建一个
Noah A. Smith | Ronan Le Bras | Alexander R. Fabbri | Yejin Choi | Jungo Kasai | Keisuke Sakaguchi | Lavinia Dunagan | Jacob Morrison | Jacob Daniel Morrison
[1] Noah A. Smith,et al. Transparent Human Evaluation for Image Captioning , 2021, NAACL.
[2] Tal August,et al. All That’s ‘Human’ Is Not Gold: Evaluating Human Evaluation of Generated Text , 2021, ACL.
[3] Atsushi Fujita,et al. Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers , 2021, ACL.
[4] Anjana Arunkumar,et al. How Robust are Model Rankings : A Leaderboard Customization Approach for Equitable Evaluation , 2021, AAAI.
[5] Markus Freitag,et al. Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation , 2021, Transactions of the Association for Computational Linguistics.
[6] Ronan Le Bras,et al. CLIPScore: A Reference-free Evaluation Metric for Image Captioning , 2021, EMNLP.
[7] Jungo Kasai,et al. Finetuning Pretrained Transformers into RNNs , 2021, EMNLP.
[8] Roy Schwartz,et al. Random Feature Attention , 2021, ICLR.
[9] Diyi Yang,et al. The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics , 2021, GEM.
[10] Jungo Kasai,et al. GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation , 2021, ArXiv.
[11] Lei Zhang,et al. VinVL: Making Visual Representations Matter in Vision-Language Models , 2021, ArXiv.
[12] Dragomir R. Radev,et al. SummEval: Re-evaluating Summarization Evaluation , 2020, Transactions of the Association for Computational Linguistics.
[13] Jungo Kasai,et al. Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation , 2020, ICLR.
[14] A. Linear-probe,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021 .
[15] Xiao Pan,et al. The Volctrans Machine Translation System for WMT20 , 2020, WMT@EMNLP.
[16] Xiangang Li,et al. DiDi's Machine Translation System for WMT2020 , 2020, WMT@EMNLP.
[17] Hai Zhao,et al. SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task , 2020, WMT@EMNLP.
[18] Jie Zhou,et al. WeChat Neural Machine Translation Systems for WMT20 , 2020, WMT@EMNLP.
[19] Alon Lavie,et al. COMET: A Neural Framework for MT Evaluation , 2020, EMNLP.
[20] Nitika Mathur,et al. Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics , 2020, ACL.
[21] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[22] Jeffrey P. Bigham,et al. Twitter A11y: A Browser Extension to Make Twitter Images Accessible , 2020, CHI.
[23] Markus Freitag,et al. BLEU Might Be Guilty but References Are Not Innocent , 2020, EMNLP.
[24] Thibault Sellam,et al. BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.
[25] Matt Post,et al. Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing , 2020, EMNLP.
[26] Peter J. Liu,et al. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.
[27] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[28] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[29] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[30] Jason J. Corso,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2019, AAAI.
[31] Myle Ott,et al. On The Evaluation of Machine Translation SystemsTrained With Back-Translation , 2019, ACL.
[32] Oren Etzioni,et al. Green AI , 2019, Commun. ACM.
[33] Kilian Q. Weinberger,et al. BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.
[34] Markus Freitag,et al. Results of the WMT20 Metrics Shared Task , 2020, WMT.
[35] Maite Oronoz,et al. Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages , 2020, WMT.
[36] Srivatsan Srinivasan,et al. The DeepMind Chinese–English Document Translation System at WMT2020 , 2020, WMT.
[37] Shuming Shi,et al. Tencent Neural Machine Translation Systems for the WMT20 News Translation Task , 2020, WMT.
[38] Shiliang Sun,et al. HW-TSC's Participation in the WMT 2020 News Translation Shared Task , 2020, WMT@EMNLP.
[39] Xiaopu Li,et al. OPPO's Machine Translation Systems for WMT20 , 2020, WMT@EMNLP.
[40] Andreas Eisele,et al. eTranslation's Submissions to the WMT 2020 News Translation Task , 2020, WMT@EMNLP.
[41] Alexander Molchanov. PROMT Systems for WMT 2020 Shared News Translation Task , 2020, WMT@EMNLP.
[42] Jeremy Gwinnup,et al. The AFRL WMT20 News Translation Systems , 2020, WMT@EMNLP.
[43] Ulrich Germann. The University of Edinburgh's submission to the German-to-English and English-to-German Tracks in the WMT 2020 News Translation and Zero-shot Translation Robustness Tasks , 2020, WMT@EMNLP.
[44] Jun Suzuki,et al. Tohoku-AIP-NTT at WMT 2020 News Translation Task , 2020, WMT@EMNLP.
[45] Philipp Koehn,et al. Findings of the 2020 Conference on Machine Translation (WMT20) , 2020, WMT.
[46] Dan Jurafsky,et al. Utility is in the Eye of the User: A Critique of NLP Leaderboards , 2020, EMNLP.
[47] Ioannis Konstas,et al. Findings of the Fourth Workshop on Neural Generation and Translation , 2020, NGT@ACL.
[48] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[49] Tom B. Brown,et al. Fine-Tuning Language Models from Human Preferences , 2019, ArXiv.
[50] Zhe Hu,et al. An Entity-Driven Framework for Abstractive Summarization , 2019, EMNLP.
[51] Sylvain Lamprier,et al. Answers Unite! Unsupervised Metrics for Reinforced Summarization Models , 2019, EMNLP.
[52] Richard Socher,et al. Neural Text Summarization: A Critical Evaluation , 2019, EMNLP.
[53] Ido Dagan,et al. Better Rewards Yield Better Summaries: Learning to Summarise Without References , 2019, EMNLP.
[54] Ondrej Bojar,et al. Results of the WMT19 Metrics Shared Task: Segment-Level and Strong MT Systems Pose Big Challenges , 2019, WMT.
[55] Mirella Lapata,et al. Text Summarization with Pretrained Encoders , 2019, EMNLP.
[56] Myle Ott,et al. Facebook FAIR’s WMT19 News Translation Task Submission , 2019, WMT.
[57] Antoine Bonnefoy,et al. STRASS: A Light and Effective Method for Extractive Summarization Based on Sentence Embeddings , 2019, ACL.
[58] Noah A. Smith,et al. Evaluating Gender Bias in Machine Translation , 2019, ACL.
[59] Michael Elhadad,et al. Question Answering as an Automatic Evaluation Metric for News Article Summarization , 2019, NAACL.
[60] Noah A. Smith,et al. Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts , 2019, ACL.
[61] Rico Sennrich,et al. When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion , 2019, ACL.
[62] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.
[63] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[64] Jiacheng Xu,et al. Neural Extractive Text Summarization with Syntactic Compression , 2019, EMNLP.
[65] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[66] Jackie Chi Kit Cheung,et al. BanditSum: Extractive Summarization as a Contextual Bandit , 2018, EMNLP.
[67] Mohit Bansal,et al. Closed-Book Training to Improve Summarization Encoder Memory , 2018, EMNLP.
[68] Andy Way,et al. Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation , 2018, WMT.
[69] Mirella Lapata,et al. Neural Latent Extractive Document Summarization , 2018, EMNLP.
[70] C. Laymon. A. study , 2018, Predication and Ontology.
[71] Alexander M. Rush,et al. Bottom-Up Abstractive Summarization , 2018, EMNLP.
[72] Richard Socher,et al. Improving Abstraction in Text Summarization , 2018, EMNLP.
[73] Tiejun Zhao,et al. Neural Document Summarization by Jointly Learning to Score and Select Sentences , 2018, ACL.
[74] Yen-Chun Chen,et al. Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting , 2018, ACL.
[75] Ramakanth Pasunuru,et al. Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation , 2018, ACL.
[76] Min Sun,et al. A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss , 2018, ACL.
[77] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[78] Yuxiang Wu,et al. Learning to Extract Coherent Summary via Deep Reinforcement Learning , 2018, AAAI.
[79] Ramakanth Pasunuru,et al. Multi-Reward Reinforced Summarization with Saliency and Entailment , 2018, NAACL.
[80] Lijun Wu,et al. Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.
[81] Mirella Lapata,et al. Ranking Sentences for Extractive Summarization with Reinforcement Learning , 2018, NAACL.
[82] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[83] The Evaluation Machine , 2017 .
[84] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[85] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[86] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[87] Maja Popovic,et al. chrF++: words helping character n-grams , 2017, WMT.
[88] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[89] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[90] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[91] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.
[92] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[93] Hermann Ney,et al. CharacTer: Translation Edit Rate on Character Level , 2016, WMT.
[94] Maja Popovic,et al. chrF: character n-gram F-score for automatic MT evaluation , 2015, WMT@EMNLP.
[95] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[96] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[97] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[98] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[99] D. Bates,et al. Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.
[100] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[101] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[102] Ondrej Bojar,et al. Results of the WMT14 Metrics Shared Task , 2013 .
[103] Yang Liu,et al. Non-Expert Evaluation of Summarization Systems is Risky , 2010, Mturk@HLT-NAACL.
[104] Ian S. Dunn,et al. Exploring the Limits , 2009 .
[105] Ewan Klein,et al. Natural Language Processing with Python , 2009 .
[106] Dou Shen,et al. TEXT SUMMARIZATION , 2022, YMER Digital.
[107] Philipp Koehn,et al. Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.
[108] Feifan Liu,et al. Correlation between ROUGE and Human Evaluation of Extractive Meeting Summaries , 2008, ACL.
[109] Philipp Koehn,et al. (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.
[110] B. MacWhinney. A UNIFIED MODEL , 2007 .
[111] Philipp Koehn,et al. Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.
[112] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.
[113] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[114] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[115] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[116] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .