GO FIGURE: A Meta Evaluation of Factuality in Summarization
暂无分享,去创建一个
[1] Shay B. Cohen,et al. Reducing the Frequency of Hallucinated Quantities in Abstractive Summaries , 2020, FINDINGS.
[2] Dragomir R. Radev,et al. SummEval: Re-evaluating Summarization Evaluation , 2020, Transactions of the Association for Computational Linguistics.
[3] Elizabeth Clark,et al. Evaluation of Text Generation: A Survey , 2020, ArXiv.
[4] Mona T. Diab,et al. FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization , 2020, ACL.
[5] Lingfei Wu,et al. Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward , 2020, ACL.
[6] Ryan McDonald,et al. On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.
[7] Ronan Le Bras,et al. Unsupervised Commonsense Question Answering with Self-Talk , 2020, EMNLP.
[8] Thibault Sellam,et al. BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.
[9] Alex Wang,et al. Asking and Answering Questions to Evaluate the Factual Consistency of Summaries , 2020, ACL.
[10] Chenguang Zhu,et al. Boosting Factual Correctness of Abstractive Summarization , 2020 .
[11] Xuedong Huang,et al. Boosting Factual Correctness of Abstractive Summarization with Knowledge Graph , 2020, ArXiv.
[12] John Bohannon,et al. Fill in the BLANC: Human-free quality estimation of document summaries , 2020, EVAL4NLP.
[13] Aleksander Wawer,et al. SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization , 2019, EMNLP.
[14] Fabio Petroni,et al. How Decoding Strategies Affect the Verifiability of Generated Text , 2019, FINDINGS.
[15] Richard Socher,et al. Evaluating the Factual Consistency of Abstractive Text Summarization , 2019, EMNLP.
[16] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[17] Sylvain Lamprier,et al. Answers Unite! Unsupervised Metrics for Reinforced Summarization Models , 2019, EMNLP.
[18] Jiawei Han,et al. Facet-Aware Evaluation for Extractive Summarization , 2019, ACL.
[19] Aliaksei Severyn,et al. Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.
[20] Michael Elhadad,et al. Question Answering as an Automatic Evaluation Metric for News Article Summarization , 2019, NAACL.
[21] Ben Goodrich,et al. Assessing The Factual Accuracy of Generated Text , 2019, KDD.
[22] Mirella Lapata,et al. Hierarchical Transformers for Multi-Document Summarization , 2019, ACL.
[23] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.
[24] Ido Dagan,et al. Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference , 2019, ACL.
[25] Noah A. Smith,et al. Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts , 2019, ACL.
[26] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[27] Kilian Q. Weinberger,et al. BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.
[28] Percy Liang,et al. Unifying Human and Statistical Evaluation for Natural Language Generation , 2019, NAACL.
[29] Mirella Lapata,et al. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.
[30] Arun Tejasvi Chaganty,et al. The price of debiasing automatic metrics in natural language evalaution , 2018, ACL.
[31] Ramakanth Pasunuru,et al. Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation , 2018, ACL.
[32] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.
[33] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[34] Furu Wei,et al. Faithful to the Original: Fact Aware Neural Abstractive Summarization , 2017, AAAI.
[35] Verena Rieser,et al. Referenceless Quality Estimation for Natural Language Generation , 2017, ArXiv.
[36] Verena Rieser,et al. Why We Need New Evaluation Metrics for NLG , 2017, EMNLP.
[37] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[38] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[39] Yvette Graham,et al. Re-evaluating Automatic Summarization with BLEU and 192 Shades of ROUGE , 2015, EMNLP.
[40] Francis M. Tyers,et al. Evaluating machine translation for assimilation via a gap-filling task , 2015, EAMT.
[41] John M. Conroy,et al. A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art , 2013, ACL.
[42] Anja Belz,et al. An Investigation into the Validity of Some Metrics for Automatically Evaluating Natural Language Generation Systems , 2009, CL.
[43] Philipp Koehn,et al. (Meta-) Evaluation of Machine Translation , 2007, WMT@ACL.
[44] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[45] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[46] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.
[47] Francis M. Tyers,et al. Evaluating machine translation for assimilation via a gap-filling task , 2015, European Association for Machine Translation Conferences/Workshops.