QuestEval: Summarization Asks for Fact-based Evaluation
暂无分享,去创建一个
Sylvain Lamprier | Patrick Gallinari | Jacopo Staiano | Paul-Alexis Dray | Thomas Scialom | Benjamin Piwowarski | Alex Wang | P. Gallinari | Jacopo Staiano | Benjamin Piwowarski | Alex Wang | Thomas Scialom | Paul-Alexis Dray | S. Lamprier
[1] Trevor Darrell,et al. Object Hallucination in Image Captioning , 2018, EMNLP.
[2] Richard Socher,et al. Neural Text Summarization: A Critical Evaluation , 2019, EMNLP.
[3] Mirella Lapata,et al. Discourse Constraints for Document Compression , 2010, CL.
[4] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[5] Verena Rieser,et al. Fact-based Content Weighting for Evaluating Abstractive Summarisation , 2020, ACL.
[6] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.
[7] Ming Zhou,et al. Neural Question Generation from Text: A Preliminary Study , 2017, NLPCC.
[8] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[9] Michael Elhadad,et al. Question Answering as an Automatic Evaluation Metric for News Article Summarization , 2019, NAACL.
[10] Manik Bhandari,et al. Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics , 2020, COLING.
[11] Yao Zhao,et al. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.
[12] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[13] Shay B. Cohen,et al. Reducing the Frequency of Hallucinated Quantities in Abstractive Summaries , 2020, FINDINGS.
[14] Verena Rieser,et al. Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.
[15] Ani Nenkova,et al. Automatically Assessing Machine Summary Content Without a Gold Standard , 2013, CL.
[16] Kilian Q. Weinberger,et al. BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.
[17] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Sylvain Lamprier,et al. Data-QuestEval: A Referenceless Metric for Data-to-Text Semantic Evaluation , 2021, EMNLP.
[19] Mirella Lapata,et al. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.
[20] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[21] Ido Dagan,et al. Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference , 2019, ACL.
[22] Fei Wu,et al. A Semantic QA-Based Approach for Text Summarization Evaluation , 2017, AAAI.
[23] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[24] Sylvain Lamprier,et al. Answers Unite! Unsupervised Metrics for Reinforced Summarization Models , 2019, EMNLP.
[25] Philip Bachman,et al. NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.
[26] Richard Socher,et al. Evaluating the Factual Consistency of Abstractive Text Summarization , 2019, EMNLP.
[27] Alex Wang,et al. Asking and Answering Questions to Evaluate the Factual Consistency of Summaries , 2020, ACL.
[28] Benoît Sagot,et al. Rethinking Automatic Evaluation in Sentence Simplification , 2021, ArXiv.
[29] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[30] Ryan McDonald,et al. On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.
[31] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[32] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[33] Xinyan Xiao,et al. Improving Neural Abstractive Document Summarization with Explicit Information Selection Modeling , 2018, EMNLP.
[34] Kyomin Jung,et al. QACE: Asking Questions to Evaluate an Image Caption , 2021, EMNLP.
[35] Maxime Peyrard,et al. Studying Summarization Evaluation Metrics in the Appropriate Scoring Range , 2019, ACL.
[36] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[37] Mona T. Diab,et al. FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization , 2020, ACL.
[38] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.
[39] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.