论文信息 - GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation - 字舞流文

GenWiki: A Dataset of 1.3 Million Content-Sharing Text and Graphs for Unsupervised Graph-to-Text Generation

Data collection for the knowledge graph-to-text generation is expensive. As a result, research on unsupervised models has emerged as an active field recently. However, most unsupervised models have to use non-parallel versions of existing small supervised datasets, which largely constrain their potential. In this paper, we propose a large-scale, general-domain dataset, GenWiki. Our unsupervised dataset has 1.3M text and graph examples, respectively. With a human-annotated test set, we provide this new benchmark dataset for future research on unsupervised text generation from knowledge graphs.1

Zheng Zhang | Xipeng Qiu | Zhijing Jin | Qipeng Guo | Xipeng Qiu | Qipeng Guo | Zheng Zhang | Qipeng Guo | Zhijing Jin

[1] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[2] François Portet,et al. Generation of Company descriptions using concept-to-text and text-to-text deep models: dataset collection and systems evaluation , 2018, INLG.

[3] Heng Ji,et al. Describing a Knowledge Base , 2018, INLG.

[4] Eneko Agirre,et al. An Effective Approach to Unsupervised Machine Translation , 2019, ACL.

[5] Verena Rieser,et al. The E2E Dataset: New Challenges For End-to-End Generation , 2017, SIGDIAL Conference.

[6] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7] Mirella Lapata,et al. Unsupervised Concept-to-text Generation with Hypergraphs , 2012, NAACL.

[8] Blake Howald,et al. Domain Adaptable Semantic Clustering in Statistical NLG , 2013, IWCS.

[9] Daniel Jurafsky,et al. Distant supervision for relation extraction without labeled data , 2009, ACL.

[10] Markus Freitag,et al. Unsupervised Natural Language Generation with Denoising Autoencoders , 2018, EMNLP.

[11] David Grangier,et al. Neural Text Generation from Structured Data with Application to the Biography Domain , 2016, EMNLP.

[12] Dan Klein,et al. Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[13] Alexander J. Smola,et al. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.

[14] Susan McRoy,et al. YAG: A Template-Based Generator for Real-Time Systems , 2000, INLG.

[15] Claire Gardent,et al. The KBGen Challenge , 2013, ENLG.

[16] Diyi Yang,et al. ToTTo: A Controlled Table-To-Text Generation Dataset , 2020, EMNLP.

[17] David Vandyke,et al. Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[18] Dan Klein,et al. A Simple Domain-Independent Probabilistic Approach to Generation , 2010, EMNLP.

[19] Andrew McCallum,et al. Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[20] W. Johnson,et al. Studies in language behavior: A program of research , 1944 .

[21] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[22] Karen Kukich,et al. Design of a Knowledge-Based Report Generator , 1983, ACL.

[23] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24] Paul Holmes-Higgin. Text generation - using discourse strategies and focus constraints to generate natural language text by Kathleen R. McKeown, Cambridge University Press, 1992, pp 246, £13.95, ISBN 0-521-43802-0 , 1994, Knowl. Eng. Rev..

[25] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Claire Gardent,et al. The WebNLG Challenge: Generating Text from RDF Data , 2017, INLG.

[27] Volker Tresp,et al. An Unsupervised Joint System for Text Generation from Knowledge Graphs and Semantic Parsing , 2020, EMNLP.

[28] Alexander M. Rush,et al. Challenges in Data-to-Document Generation , 2017, EMNLP.

[29] Raymond J. Mooney,et al. Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[30] Mirella Lapata,et al. Text Generation from Knowledge Graphs with Graph Transformers , 2019, NAACL.

[31] Ido Dagan,et al. Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation , 2019, NAACL.

[32] Zheng Zhang,et al. CycleGT: Unsupervised Graph-to-Text and Text-to-Graph Generation via Cycle Training , 2020, ArXiv.