论文信息 - Assessing The Factual Accuracy of Generated Text - 字舞流文

Assessing The Factual Accuracy of Generated Text

We propose a model-based metric to estimate the factual accuracy of generated text that is complementary to typical scoring schemes like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy). We introduce and release a new large-scale dataset based on Wikipedia and Wikidata to train relation classifiers and end-to-end fact extraction models. The end-to-end models are shown to be able to extract complete sets of facts from datasets with full pages of text. We then analyse multiple models that estimate factual accuracy on a Wikipedia text summarization task, and show their efficacy compared to ROUGE and other model-free variants by conducting a human evaluation study.

Ben Goodrich | Peter J. Liu | Vinay Rao | Mohammad Saleh | Peter J Liu | Mohammad Saleh | Ben Goodrich | Vinay Rao

[1] Fenglong Ma,et al. TextTruth: An Unsupervised Approach to Discover Trustworthy Information from Multi-Sourced Text Data , 2018, KDD.

[2] Christopher D. Manning,et al. Improving Coreference Resolution by Learning Entity-Level Distributed Representations , 2016, ACL.

[3] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.

[4] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[5] Christopher Potts,et al. The Life and Death of Discourse Entities: Identifying Singleton Mentions , 2013, NAACL.

[6] Heeyoung Lee,et al. Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.

[7] Susan McRoy,et al. Using Multiple Knowledge Sources for Word Sense Discrimination , 1992, Comput. Linguistics.

[8] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[9] Furu Wei,et al. Faithful to the Original: Fact Aware Neural Abstractive Summarization , 2017, AAAI.

[10] Alexander M. Rush,et al. Challenges in Data-to-Document Generation , 2017, EMNLP.

[11] Makoto Miwa,et al. Modeling Joint Entity and Relation Extraction with Table Representation , 2014, EMNLP.

[12] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13] Andrew McCallum,et al. A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models , 2003 .

[14] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[15] Robert L. Mercer,et al. An Estimate of an Upper Bound for the Entropy of English , 1992, CL.

[16] Iryna Gurevych,et al. Context-Aware Representations for Knowledge Base Relation Extraction , 2017, EMNLP.

[17] Ani Nenkova,et al. Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[18] Christopher D. Manning,et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[19] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[20] Markus Krötzsch,et al. Wikidata , 2014, Commun. ACM.

[21] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.

[22] Lukasz Kaiser,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.

[23] Jun-ichi Fukumoto,et al. Automated Summarization Evaluation with Basic Elements. , 2006, LREC.

[24] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[25] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[26] Andrew McCallum,et al. Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[27] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[28] Bowen Zhou,et al. Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation , 2016, AAAI.

[29] Joelle Pineau,et al. A Deep Reinforcement Learning Chatbot , 2017, ArXiv.

[30] Heeyoung Lee,et al. A Multi-Pass Sieve for Coreference Resolution , 2010, EMNLP.

[31] Oren Etzioni,et al. Open Information Extraction from the Web , 2007, CACM.

[32] Zhiyuan Liu,et al. Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[33] Eric Nichols,et al. Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[34] Makoto Miwa,et al. End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures , 2016, ACL.

[35] Anima Anandkumar,et al. Deep Active Learning for Named Entity Recognition , 2017, Rep4NLP@ACL.

[36] Estevam R. Hruschka,et al. Discovering Relations between Noun Categories , 2011, EMNLP.

[37] Daniel Jurafsky,et al. Distant supervision for relation extraction without labeled data , 2009, ACL.

[38] Ramesh Nallapati,et al. Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.

[39] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[40] Karel Jezek,et al. Evaluation Measures for Text Summarization , 2012, Comput. Informatics.

[41] Alan Ritter,et al. Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.