Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary
暂无分享,去创建一个
[1] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[2] Fei Liu,et al. MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance , 2019, EMNLP.
[3] Kilian Q. Weinberger,et al. BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.
[4] Ani Nenkova,et al. Automatically Assessing Machine Summary Content Without a Gold Standard , 2013, CL.
[5] Mirella Lapata,et al. Ranking Sentences for Extractive Summarization with Reinforcement Learning , 2018, NAACL.
[6] Eduard H. Hovy,et al. Summarization Evaluation Using Transformed Basic Elements , 2008, TAC.
[7] Percy Liang,et al. Transforming Question Answering Datasets Into Natural Language Inference Datasets , 2018, ArXiv.
[8] Ani Nenkova,et al. Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.
[9] Mirella Lapata,et al. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.
[10] Sylvain Lamprier,et al. Answers Unite! Unsupervised Metrics for Reinforced Summarization Models , 2019, EMNLP.
[11] Iryna Gurevych,et al. Learning to Score System Summaries for Better Content Selection Evaluation. , 2017, NFiS@EMNLP.
[12] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[13] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[14] Mona T. Diab,et al. FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization , 2020, ACL.
[15] Chen Sun,et al. Automated Pyramid Summarization Evaluation , 2019, CoNLL.
[16] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[17] Qian Yang,et al. PEAK: Pyramid Evaluation via Automated Knowledge Extraction , 2016, AAAI.
[18] Jun-ichi Fukumoto,et al. Automated Summarization Evaluation with Basic Elements. , 2006, LREC.
[19] George A. Vouros,et al. Summarization system evaluation revisited: N-gram graphs , 2008, TSLP.
[20] Thibault Sellam,et al. BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.
[21] Matt Gardner,et al. MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics , 2020, EMNLP.
[22] John M. Conroy,et al. Mind the Gap: Dangers of Divorcing Evaluations of Summary Content from Linguistic Quality , 2008, COLING.
[23] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[24] Dan Roth,et al. A Statistical Analysis of Summarization Evaluation Metrics Using Resampling Methods , 2021, Transactions of the Association for Computational Linguistics.
[25] Graham Neubig,et al. Re-evaluating Evaluation in Text Summarization , 2020, EMNLP.
[26] Hoa Trang Dang,et al. Overview of the TAC 2008 Update Summarization Task , 2008, TAC.
[27] Ido Dagan,et al. Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation , 2019, NAACL.
[28] Alex Wang,et al. Asking and Answering Questions to Evaluate the Factual Consistency of Summaries , 2020, ACL.
[29] Dan Roth,et al. Understanding the Extent to which Summarization Evaluation Metrics Measure the Information Quality of Summaries , 2020, ArXiv.
[30] Michael Elhadad,et al. Question Answering as an Automatic Evaluation Metric for News Article Summarization , 2019, NAACL.
[31] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[32] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[33] Richard Socher,et al. SummEval: Re-evaluating Summarization Evaluation , 2020, ArXiv.