Template-Based Multi-solution Approach for Data-to-Text Generation

Data-to-text generation is usually defined into two parts: planning how to order and structure the information, and generating a text grammatically correct and fluent, that is faithful to the facts described in the input knowledge base source. A typically knowledge base consists of Resource Description Framework (RDF) triples which describe the entities and their relations. There are plenty of end-to-end solutions proposed to generate natural language descriptions from RDF, however, they require large and noise-free training datasets, lack control over how the text will be generated and there is no guarantee that the generated text verbalizes all and only the input. We address these problems by proposing a modular solution that uses templates and generates multiple texts over the data-to-text generation phases, returning the best one. Our experiments on a real-world dataset demonstrate that our approach generates higher quality texts and outperforms some baseline models regarding BLEU, METEOR, and TER.

[1]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[2]  Diego Marcheggiani,et al.  Deep Graph Convolutional Encoders for Structured Data to Text Generation , 2018, INLG.

[3]  Philipp Cimiano,et al.  Exploiting Ontology Lexica for Generating Natural Language Texts from RDF Data , 2013, ENLG.

[4]  Ehud Reiter Pipelines and Size Constraints , 2000, Computational Linguistics.

[5]  Emiel Krahmer,et al.  Explorations in Sentence Fusion , 2005, ENLG.

[6]  Ngoc Thang Vu,et al.  Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity , 2018, INLG.

[7]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[8]  Daniel Duma,et al.  Generating Natural Language from Linked Data: Unsupervised template extraction , 2013, IWCS.

[9]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[10]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Emiel Krahmer,et al.  Squibs and Discussions: Real versus Template-Based Natural Language Generation: A False Opposition? , 2005, CL.

[12]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[13]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[14]  Verena Rieser,et al.  Semantic Noise Matters for Neural Natural Language Generation , 2019, INLG.

[15]  Robert Dale,et al.  Building applied natural language generation systems , 1997, Natural Language Engineering.

[16]  Liheng Chen,et al.  Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence , 2019, SIGIR.

[17]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[18]  Claire Gardent,et al.  The WebNLG Challenge: Generating Text from DBPedia Data , 2016, INLG.

[19]  Mariët Theune,et al.  Template-based multilingual football reports generation using Wikidata as a knowledge base , 2018, INLG.

[20]  Mary Dee Harris,et al.  Building a Large-scale Commercial NLG System for an EMR , 2008, INLG.

[21]  Emiel Krahmer,et al.  Neural data-to-text generation: A comparison between pipeline and end-to-end architectures , 2019, EMNLP.

[22]  Emiel Krahmer,et al.  Enriching the WebNLG corpus , 2018, INLG.

[23]  Wei Wang,et al.  GTR-LSTM: A Triple Encoder for Sentence Generation from RDF Data , 2018, ACL.

[24]  Shashi Narayan,et al.  Creating Training Corpora for NLG Micro-Planners , 2017, ACL.

[25]  Emiel Krahmer,et al.  Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..

[26]  Ido Dagan,et al.  Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation , 2019, NAACL.

[27]  Xiaojun Wan,et al.  A Semi-Supervised Approach for Low-Resourced Text Generation , 2019, ArXiv.

[28]  Claire Gardent,et al.  WebNLG Challenge: Human Evaluation Results , 2018 .