MQAG: Multiple-choice Question Answering and Generation for Assessing Information Consistency in Summarization
暂无分享,去创建一个
[1] M. Gales,et al. Podcast Summary Assessment: A Resource for Evaluating Summary Assessment Methods , 2022, ArXiv.
[2] D. Roth,et al. Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics , 2022, FINDINGS.
[3] Pascale Fung,et al. Survey of Hallucination in Natural Language Generation , 2022, ACM Comput. Surv..
[4] Richard Yuanzhe Pang,et al. QuALITY: Question Answering with Long Input Texts, Yes! , 2021, NAACL.
[5] Yinfei Yang,et al. SueNes: A Weakly Supervised Approach to Evaluating Single-Document Summarization via Negative Sampling , 2020, NAACL.
[6] Ramesh Nallapati,et al. Improving Factual Consistency of Abstractive Summarization via Question Answering , 2021, ACL.
[7] Bing Qin,et al. The Factual Inconsistency Problem in Abstractive Text Summarization: A Survey , 2021, ArXiv.
[8] D. Roth,et al. Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary , 2020, Transactions of the Association for Computational Linguistics.
[9] Dragomir R. Radev,et al. SummEval: Re-evaluating Summarization Evaluation , 2020, Transactions of the Association for Computational Linguistics.
[10] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[11] Mona T. Diab,et al. FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization , 2020, ACL.
[12] Ryan McDonald,et al. On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.
[13] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[14] Alex Wang,et al. Asking and Answering Questions to Evaluate the Factual Consistency of Summaries , 2020, ACL.
[15] Richard Socher,et al. Evaluating the Factual Consistency of Abstractive Text Summarization , 2019, EMNLP.
[16] Richard Socher,et al. Neural Text Summarization: A Critical Evaluation , 2019, EMNLP.
[17] Ben Goodrich,et al. Assessing The Factual Accuracy of Generated Text , 2019, KDD.
[18] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[19] Guokun Lai,et al. RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.
[20] Ian S. Dunn,et al. Exploring the Limits , 2009 .
[21] Oren Etzioni,et al. Open Information Extraction from the Web , 2007, CACM.
[22] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[23] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.