An Unsupervised Joint System for Text Generation from Knowledge Graphs and Semantic Parsing

Knowledge graph (KG) schemas can vary greatly from one domain to another. Therefore supervised approaches to graph-to-text generation and text-to-graph knowledge extraction (semantic parsing) will always suffer from a shortage of domain-specific parallel graph-text data, while adapting a model trained on a different domain is often impossible due to little or no overlap in entities and relations. This situation calls for an approach that (1) does not need large amounts of annotated data and (2) is easy to adapt to new KG schemas. To this end, we present the first approach to fully unsupervised text generation from KGs and KG generation from text. Inspired by recent work on unsupervised machine translation, we serialize a KG as a sequence of facts and frame both tasks as sequence translation. By means of a shared sequence encoder and decoder, our model learns to map both graphs and texts into a joint semantic space and thus generalizes over different surface representations with the same meaning. We evaluate our approach on WebNLG v2.1 and a new benchmark leveraging scene graphs from Visual Genome. Our system outperforms strong baselines for both text$\leftrightarrow$graph tasks without any manual adaptation from one dataset to the other. In additional experiments, we investigate the impact of using different unsupervised objectives.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Claire Gardent,et al.  Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs , 2019, EMNLP.

[3]  Gerard de Melo,et al.  Generating Fine-Grained Open Vocabulary Entity Type Descriptions , 2018, ACL.

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Frank van Harmelen,et al.  Handbook of Knowledge Representation , 2008, Handbook of Knowledge Representation.

[6]  Diego Marcheggiani,et al.  Deep Graph Convolutional Encoders for Structured Data to Text Generation , 2018, INLG.

[7]  Alexander M. Rush,et al.  Challenges in Data-to-Document Generation , 2017, EMNLP.

[8]  Dan Roth,et al.  Confidence Driven Unsupervised Semantic Parsing , 2011, ACL.

[9]  Iryna Gurevych,et al.  Enhancing AMR-to-Text Generation with Dual Graph Representations , 2019, EMNLP.

[10]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[11]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[12]  Maja Popovic,et al.  chrF++: words helping character n-grams , 2017, WMT.

[13]  Iryna Gurevych,et al.  Modeling Global and Local Node Contexts for Text Generation from Knowledge Graphs , 2020, Transactions of the Association for Computational Linguistics.

[14]  Hoifung Poon,et al.  Grounded Unsupervised Semantic Parsing , 2013, ACL.

[15]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[16]  Alexander M. Rush,et al.  End-to-End Content and Plan Selection for Data-to-Text Generation , 2018, INLG.

[17]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[18]  Susan McRoy,et al.  YAG: A Template-Based Generator for Real-Time Systems , 2000, INLG.

[19]  Karen Kukich,et al.  Design of a Knowledge-Based Report Generator , 1983, ACL.

[20]  Shay B. Cohen,et al.  Structural Neural Encoders for AMR-to-text Generation , 2019, NAACL.

[21]  Kôiti Hasida,et al.  Reactive Content Selection in the Generation of Real-time Soccer Commentary , 1998, COLING-ACL.

[22]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[23]  Jacques Robin,et al.  Revision-based generation of natural language summaries providing historical background: corpus-based analysis, design, implementation and evaluation , 1995 .

[24]  Emiel Krahmer,et al.  Neural data-to-text generation: A comparison between pipeline and end-to-end architectures , 2019, EMNLP.

[25]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[26]  Benjamin Piwowarski,et al.  Unsupervised Information Extraction: Regularizing Discriminative Approaches with Relation Distribution Losses , 2019, ACL.

[27]  Wei Wang,et al.  GTR-LSTM: A Triple Encoder for Sentence Generation from RDF Data , 2018, ACL.

[28]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[29]  Mari Ostendorf,et al.  Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction , 2018, EMNLP.

[30]  André Freitas,et al.  A Survey on Open Information Extraction , 2018, COLING.

[31]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[32]  Shashi Narayan,et al.  Creating Training Corpora for NLG Micro-Planners , 2017, ACL.

[33]  Hoifung Poon,et al.  Grounded Semantic Parsing for Complex Knowledge Extraction , 2015, NAACL.

[34]  Andrew McCallum,et al.  Structured Relation Discovery using Generative Models , 2011, EMNLP.

[35]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[36]  Aldo Gangemi,et al.  Conversion of WordNet to a standard RDF/OWL representation , 2006, LREC.

[37]  Li Fei-Fei,et al.  Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval , 2015, VL@EMNLP.

[38]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[39]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[40]  Hoifung Poon,et al.  Unsupervised Semantic Parsing , 2009, EMNLP.

[41]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[42]  Mirella Lapata,et al.  Text Generation from Knowledge Graphs with Graph Transformers , 2019, NAACL.

[43]  Diego Marcheggiani,et al.  Discrete-State Variational Autoencoders for Joint Discovery and Factorization of Relations , 2016, TACL.

[44]  Gholamreza Haffari,et al.  Graph-to-Sequence Learning using Gated Graph Neural Networks , 2018, ACL.

[45]  Oren Etzioni,et al.  Modern Natural Language Interfaces to Databases: Composing Statistical Parsing with Semantic Tractability , 2004, COLING.

[46]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[47]  Michael S. Bernstein,et al.  Visual Relationship Detection with Language Priors , 2016, ECCV.

[48]  Rajarshi Das,et al.  A Survey on Semantic Parsing , 2018, AKBC.

[49]  Claire Gardent,et al.  Handling Rare Items in Data-to-Text Generation , 2018, INLG.

[50]  Hai Wan,et al.  Representation Learning for Scene Graph Completion via Jointly Structural and Visual Embedding , 2018, IJCAI.

[51]  Quoc V. Le,et al.  Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.

[52]  Mirella Lapata,et al.  Data-to-Text Generation with Content Selection and Planning , 2018, AAAI.

[53]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[54]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[55]  Claire Gardent,et al.  The WebNLG Challenge: Generating Text from RDF Data , 2017, INLG.

[56]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[57]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[58]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.