Surface Realisation from Knowledge-Bases

Natural Language Generation (NLG) is the task of automatically producing natural language text to describe information present in non-linguistic data. It involves three main subtasks: (i) selecting the relevant portion of input data; (ii) determining the words that will be used to verbalise the selected data; and (iii) mapping these words into natural language text. The latter task is known as Surface Realisation (SR). In my thesis, I study the SR task in the context of input data coming from Knowledge Bases (KB). I present two novel approaches to surface realisation from knowledge bases: a supervised approach and a weakly supervised approach. In the first, supervised, approach, I present a corpus-based method for inducing a Feature Based Lexicalized Tree Adjoining Grammar (FB-LTAG) from a parallel corpus of text and data. The resulting grammar includes a unification based semantics and can be used by an existing surface realiser to generate sentences from test data. I show that the induced grammar is compact and generalises well over the test data yielding results that are close to those produced by a handcrafted symbolic approach and which outperform an alternative statistical approach. In the weakly supervised approach, I explore a method for surface realisation from KB data which uses a supplied lexicon but does not require a parallel corpus. Instead, I build a corpus from heterogeneous sources of domain-related text and use it to identify possible lexicalisations of KB symbols (classes and relations) and their verbalisation patterns (frames). Based on the observations made, I build different probabilistic models which are used for selection of appropriate frames and syntax/semantics linking while verbalising KB inputs. I evaluate the output sentences and analyse the issues relevant to learning from non-parallel corpora. In both these approaches, I use the data derived from an existing biomedical ontology as a reference input. The proposed methods are generic and can be easily adapted for input from other ontologies for which a parallel/non-parallel corpora exists.

[1]  Asunción Gómez-Pérez,et al.  ONTOGENERATION: Reusing Domain and Linguistic Ontologies for Spanish Text Generation , 1998 .

[2]  Andreas Papasalouros,et al.  Automatic Generation Of Multiple Choice Questions From Domain Ontologies , 2008, e-Learning.

[3]  Hwee Tou Ng,et al.  A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions , 2011, EMNLP.

[4]  Laurence Danlos Some issues in text generation , 1989 .

[5]  Michael White,et al.  Perceptron Reranking for CCG Realization , 2009, EMNLP.

[6]  Shashi Narayan,et al.  Proceedings of the 24th International Conference on Computational Linguistics (COLING) , 2012, International Conference on Computational Linguistics.

[7]  Alex Lascarides,et al.  Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue , 2008 .

[8]  Claire Gardent,et al.  A Symbolic Approach to Near-Deterministic Surface Realisation using Tree Adjoining Grammar , 2007, ACL.

[9]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[10]  Claire Gardent,et al.  The KBGen Challenge , 2013, ENLG.

[11]  Gertjan van Noord,et al.  Semantic-Head-Driven Generation , 1990, Comput. Linguistics.

[12]  Sampo Pyysalo,et al.  BioCause: Annotating and analysing causality in the biomedical domain , 2013, BMC Bioinformatics.

[13]  Robert Stevens,et al.  Automating generation of textual class definitions from OWL to English , 2011, J. Biomed. Semant..

[14]  Ehud Reiter,et al.  Lessons from a failure: Generating tailored smoking cessation letters , 2003, Artif. Intell..

[15]  Anna Korhonen,et al.  Semantically Motivated Subcategorization Acquisition , 2002, ACL 2002.

[16]  Stephan Oepen,et al.  Statistical Ranking in Tactical Generation , 2006, EMNLP.

[17]  Philipp Cimiano,et al.  A Corpus-Based Approach for the Induction of Ontology Lexica , 2013, NLDB.

[18]  Stephan Busemann,et al.  Best-First Surface Realization , 1996, INLG.

[19]  Ion Androutsopoulos,et al.  An Open-Source Natural Language Generator for OWL Ontologies and its Use in Protege and Second Life , 2009, EACL.

[20]  Karthik Gali,et al.  Sentence Realisation from Bag of Words with Dependency Constraints , 2009, HLT-NAACL.

[21]  Mirella Lapata,et al.  Concept-to-text Generation via Discriminative Reranking , 2012, ACL.

[22]  Josef van Genabith,et al.  DCU at Generation Challenges 2011 Surface Realisation Track , 2011, ENLG.

[23]  Richard Power,et al.  WYSIWYM - building user interfaces with natural language feedback , 2003, EACL.

[24]  Emiel Krahmer,et al.  Squibs and Discussions: Real versus Template-Based Natural Language Generation: A False Opposition? , 2005, CL.

[25]  Mary Dalrymple,et al.  The PARC 700 Dependency Bank , 2003, LINC@EACL.

[26]  Min-Yen Kan,et al.  Corpus-trained Text Generation for Summarization , 2002, INLG.

[27]  Raymond J. Mooney,et al.  Generative Alignment and Semantic Parsing for Learning from Ambiguous Supervision , 2010, COLING.

[28]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[29]  Aravind K. Joshi,et al.  Feature Structures Based Tree Adjoining Grammars , 1988, COLING.

[30]  Claire Gardent,et al.  LOR-KBGEN, A Hybrid Approach To Generating from the KBGen Knowledge-Base , 2013, ENLG.

[31]  Carl Vogel,et al.  Proceedings of the 16th International Conference on Computational Linguistics , 1996, COLING 1996.

[32]  David DeVault,et al.  Making Grammar-Based Generation Easier to Deploy in Dialogue Systems , 2008, SIGDIAL Workshop.

[33]  Jennifer Chu-Carroll Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2012 .

[34]  Shashi Narayan,et al.  Structure-Driven Lexicalist Generation , 2012, COLING.

[35]  Kristina Striegnitz,et al.  Proceedings of the 13th European Workshop on Natural Language Generation , 2011 .

[36]  Martin Kay Proceedings of the 18th conference on Computational linguistics - Volume 2 , 2000 .

[37]  Robert Frank,et al.  Phrase Structure Composition and Syntactic Dependencies , 2002, Computational Linguistics.

[38]  David Chiang,et al.  Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar , 2000, ACL.

[39]  Nina Dethlefs,et al.  Comparing HMMs and Bayesian Networks for Surface Realisation , 2012, HLT-NAACL.

[40]  Jonas Kuhn,et al.  Underspecifying and Predicting Voice for Surface Realisation Ranking , 2011, ACL.

[41]  Tomoko Ohta Tomoko Ohta , 2012, Current Biology.

[42]  Josef van Genabith,et al.  Dependency-Based N-Gram Models for General Purpose Sentence Realisation , 2008, COLING.

[43]  Nick Cercone,et al.  Computational Linguistics , 1986, Communications in Computer and Information Science.

[44]  Kalina Bontcheva,et al.  Automatic Report Generation from Ontologies: The MIAKT Approach , 2004, NLDB.

[45]  J. Tsujii Proceedings of the 16th conference on Computational linguistics - Volume 2 , 1996 .

[46]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[47]  Jim Hunter,et al.  Choosing words in computer-generated weather forecasts , 2005, Artif. Intell..

[48]  Ion Androutsopoulos,et al.  Generating Natural Language Descriptions from OWL Ontologies: the NaturalOWL System , 2013, J. Artif. Intell. Res..

[49]  Branimir Boguraev,et al.  Natural Language Engineering , 1995 .

[50]  L. Ohno-Machado Journal of Biomedical Informatics , 2001 .

[51]  Sina Zarrieß,et al.  An Automatic Method for Building a Data-to-Text Generator , 2013, ENLG.

[52]  Jeff Z. Pan,et al.  Finding Subsumers for Natural Language Presentation , 2006, Description Logics.

[53]  Hwee Tou Ng,et al.  Natural Language Generation with Tree Conditional Random Fields , 2009, EMNLP.

[54]  Phil Blunsom,et al.  Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics , 2009 .

[55]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[56]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[57]  Anoop Sarkar,et al.  Automatic Extraction of Subcategorization Frames for Czech , 2000, COLING.

[58]  Michael Gamon,et al.  Linguistically Informed Statistical Models of Constituent Structure for Ordering in Sentence Realization , 2004, COLING.

[59]  Boris Motik,et al.  OWL 2 Web Ontology Language: structural specification and functional-style syntax , 2008 .

[60]  Johan Bos,et al.  Proceedings of the 14th European Workshop on Natural Language Generation , 2013 .

[61]  Richard Power,et al.  Expressing OWL axioms by English sentences: dubious in theory, feasible in practice , 2010, COLING.

[62]  York Sure-Vetter,et al.  Ontology Mapping - An Integrated Approach , 2004, ESWS.

[63]  David G. Hays Proceedings of the 8th conference on Computational linguistics , 1980 .

[64]  Andreas Harth,et al.  A language-independent method for the extraction of RDF verbalization templates , 2014, INLG.

[65]  Dekang Lin,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1 , 2011 .

[66]  Anja Belz,et al.  Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models , 2008, Natural Language Engineering.

[67]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[68]  Peter Clark,et al.  Building Concept Representations from Reusable Components , 1997, AAAI/IAAI.

[69]  Johan Bos,et al.  Towards Generating Text from Discourse Representation Structures , 2011, ENLG.

[70]  Chris Mellish,et al.  Instance-based natural language generation , 2001, HTL 2001.

[71]  Laura Kallmeyer,et al.  Semantic construction in feature-based TAG , 2003 .

[72]  Hans Uszkoreit,et al.  Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1 , 2008 .

[73]  Chu-Ren Huang,et al.  Proceedings of the 23rd International Conference on Computational Linguistics: Posters , 2010, COLING 2010.

[74]  Stephan Oepen,et al.  High Efficiency Realization for a Wide-Coverage Unification Grammar , 2005, IJCNLP.

[75]  Gary S. Kahn,et al.  Making Sense of Gigabytes: A System for Knowledge-Based Market Analysis , 1992, IAAI.

[76]  Enrico Motta,et al.  AquaLog: An ontology-driven question answering system for organizational semantic intranets , 2007, J. Web Semant..

[77]  Enrico Franconi,et al.  An intelligent query interface based on ontology navigation , 2010 .

[78]  Rudi Studer,et al.  The Semantic Web: Research and Applications , 2004, Lecture Notes in Computer Science.

[79]  Hlt-Naacl 06 , 2006 .

[80]  Michael Elhadad,et al.  FUF: the Universal Unifier User Manual Version 5.2 , 1991 .

[81]  Aravind K. Joshi,et al.  Proceedings of the 34th annual meeting on Association for Computational Linguistics , 1996 .

[82]  Mirella Lapata,et al.  Unsupervised Concept-to-text Generation with Hypergraphs , 2012, NAACL.

[83]  Yi Zhang,et al.  Sentence Realization with Unlexicalized Tree Linearization Grammars , 2012, COLING.

[84]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[85]  Micha Elsner,et al.  Proceedings of the 14th Conference for the European Chapter of the Association for Computational Linguistics (EACL) , 2014 .

[86]  Sophia Ananiadou,et al.  Construction of an annotated corpus to support biomedical information extraction , 2009, BMC Bioinformatics.

[87]  Daniel Duma,et al.  Generating Natural Language from Linked Data: Unsupervised template extraction , 2013, IWCS.

[88]  Sergio Tessaris,et al.  Quelo: an Ontology-Driven Query Interface , 2011, Description Logics.

[89]  Dan Flickinger,et al.  An Open Source Grammar Development Environment and Broad-coverage English Grammar Using HPSG , 2000, LREC.

[90]  Richard Power,et al.  Composing Questions through Conceptual Authoring , 2007, CL.

[91]  Kaarel Kaljurand,et al.  Verbalizing OWL in Attempto Controlled English , 2007, OWLED.

[92]  M. Osborne,et al.  Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1 , 2012 .

[93]  Steffen Staab,et al.  What Is an Ontology? , 2009, Handbook on Ontologies.

[94]  Leo Wanner,et al.  Broad Coverage Multilingual Deep Sentence Generation with a Stochastic Multi-Level Realizer , 2010, COLING.

[95]  Peter F. Patel-Schneider,et al.  OWL 2 Web Ontology Language , 2009 .

[96]  Chris Mellish,et al.  Domain Independent Sentence Generation from RDF Representations for the Semantic Web , 2006 .

[97]  Hitoshi Iida Proceedings of the 38th Annual Meeting on Association for Computational Linguistics , 2000 .

[98]  Michael Elhadad,et al.  FUF: the Universal Unifier User Manual Version 2.0 , 1989 .

[99]  Noah A. Smith,et al.  Proceedings of EMNLP , 2007 .

[100]  Claire Gardent,et al.  Incremental Query Generation , 2014, EACL.

[101]  Cécile Paris,et al.  Tailoring Object Descriptions to a User's Level of Expertise , 1988, Comput. Linguistics.

[102]  M. Strube,et al.  Using an Annotated Corpus As a Knowledge Source For Language Generation , 2005 .

[103]  Anja Belz,et al.  LG-Eval: A Toolkit for Creating Online Language Evaluation Experiments , 2012, LREC.

[104]  Leo Wanner,et al.  Data-driven sentence generation with non-isomorphic trees , 2015, HLT-NAACL.

[105]  Peter Clark,et al.  A library of generic concepts for composing knowledge bases , 2001, K-CAP '01.

[106]  Piek Vossen,et al.  23rd International Conference on Computational Linguistics , 2010 .

[107]  Beth Ann Hockey,et al.  XTAG System - A Wide Coverage Grammar for English , 1994, COLING.

[108]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[109]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[110]  Hadar Shemtov,et al.  Generation of Paraphrases from Ambiguous Logical Forms , 1996, COLING.

[111]  John D. Kelleher,et al.  Proceedings of the Sixth International Natural Language Generation Conference (INLG 2010). , 2010 .

[112]  Jun'ichi Tsujii,et al.  Probabilistic Models for Disambiguation of an HPSG-Based Chart Generator , 2005, IWPT.

[113]  Dan Klein,et al.  Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[114]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[115]  Raymond J. Mooney,et al.  Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation , 2007, NAACL.

[116]  John A. Carroll,et al.  Asymmetry in Parsing and Generating with Unification Grammars: Case Studies From ELU , 1990, ACL.

[117]  Josef van Genabith,et al.  Robust PCFG-Based Generation Using Automatically Acquired LFG Approximations , 2006, ACL.

[118]  Amanda Stent,et al.  Determining the position of adverbial phrases in English , 2009, HLT-NAACL.

[119]  Graham Wilcock Talking OWLs: Towards an Ontology Verbalizer , 2003 .

[120]  Kees van Deemter,et al.  Context modeling and the generation of spoken discourse , 1997, Speech Commun..

[121]  Christian Rohrer,et al.  DESIGNING FEATURES FOR PARSE DISAMBIGUATION AND REALISATION RANKING , 2007 .

[122]  Chung-hye Han Robert Frank, Phrase structure composition and syntactic dependencies (Current Studies in Linguistics 38). Cambridge, MA & London: MIT Press, 2002. Pp. xiv+324. , 2006, Journal of Linguistics.

[123]  Maria Wolters,et al.  Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference , 2000 .

[124]  Keith Butler,et al.  Team UDEL KBGen 2013 Challenge , 2013, ENLG.

[125]  Anja Belz,et al.  The First Surface Realisation Shared Task: Overview and Evaluation Results , 2011, ENLG.

[126]  Mark Steedman,et al.  CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank , 2007, CL.

[127]  Jonathan Berant,et al.  Building a Semantic Parser Overnight , 2015, ACL.

[128]  G Carenini,et al.  Generating patient-specific interactive natural language explanations. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[129]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[130]  Alexander I. Rudnicky,et al.  Expanding the Scope of the ATIS Task: The ATIS-3 Corpus , 1994, HLT.

[131]  Kalina Bontcheva Generating Tailored Textual Summaries from Ontologies , 2005, ESWC.

[132]  Philipp Koehn,et al.  Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2 , 2009, EMNLP 2009.

[133]  Stijn Heymans,et al.  KB_Bio_101 : A Challenge for OWL Reasoners , 2013, ORE.

[134]  Anja Belz Probabilistic Generation of Weather Forecast Texts , 2007, HLT-NAACL.

[135]  J. Program Chair-Tsujii Proceedings of the 16th conference on Computational linguistics - Volume 2 , 1996 .

[136]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[137]  Martin Kay,et al.  Chart Generation , 1996, ACL.

[138]  Peter Clark,et al.  Project Halo Update - Progress Toward Digital Aristotle , 2010, AI Mag..

[139]  Richard Power,et al.  Grouping Axioms for More Coherent Ontology Descriptions , 2010, INLG.

[140]  Jan Hajic,et al.  Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2 , 2003 .

[141]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[142]  Johan Bos,et al.  Proceedings of the 13th European Workshop on Natural Language Generation (ENLG) , 2011 .

[143]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[144]  Raymond J. Mooney,et al.  Learning for Semantic Parsing with Statistical Machine Translation , 2006, NAACL.

[145]  Juen-tin Wang,et al.  On Computational Sentence Generation From Logical Form , 1980, COLING.

[146]  Richard I. Kittredge,et al.  Using natural-language processing to produce weather forecasts , 1994, IEEE Expert.

[147]  Donia Scott,et al.  KBGen - Text Generation from Knowledge Bases as a New Shared Task , 2012, INLG.

[148]  Dan Klein,et al.  A Simple Domain-Independent Probabilistic Approach to Generation , 2010, EMNLP.