A Hierarchical Model for Data-to-Text Generation

Transcribing structured data into natural language descriptions has emerged as a challenging task, referred to as “data-to-text”. These structures generally regroup multiple elements, as well as their attributes. Most attempts rely on translation encoder-decoder methods which linearize elements into a sequence. This however loses most of the structure contained in the data. In this work, we propose to overpass this limitation with a hierarchical model that encodes the data-structure at the element-level and the structure level. Evaluations on RotoWire show the effectiveness of our model w.r.t. qualitative and quantitative metrics.

[1]  Chris Dyer,et al.  Neural Arithmetic Logic Units , 2018, NeurIPS.

[2]  Emiel Krahmer,et al.  Making effective use of healthcare data using data-to-text technology , 2018, Data Science for Healthcare.

[3]  Krisztian Balog,et al.  Web Table Extraction, Retrieval and Augmentation , 2019, SIGIR.

[4]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[5]  Jochen L. Leidner,et al.  Interacting with Financial Data using Natural Language , 2016, SIGIR.

[6]  Mirella Lapata,et al.  Data-to-text Generation with Entity Modeling , 2019, ACL.

[7]  Pedro A. Szekely,et al.  TabVec: Table Vectors for Classification of Web Tables , 2018, ArXiv.

[8]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[9]  Jian Li,et al.  Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases , 2013, Proc. VLDB Endow..

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Hao Ma,et al.  Table Cell Search for Question Answering , 2016, WWW.

[12]  Patrick Gallinari,et al.  Copy mechanism and tailored training for character-based data-to-text generation , 2019, ECML/PKDD.

[13]  Xiaojun Wan,et al.  Point Precisely: Towards Ensuring the Precision of Data in Generated Texts Using Delayed Copy Mechanism , 2018, COLING.

[14]  Reynold Xin,et al.  Finding related tables , 2012, SIGMOD Conference.

[15]  Eric Brill,et al.  An Improved Error Model for Noisy Channel Spelling Correction , 2000, ACL.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Krisztian Balog,et al.  Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval , 2019, SIGIR.

[18]  Alexander M. Rush,et al.  Learning Neural Templates for Text Generation , 2018, EMNLP.

[19]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[20]  Jim Hunter,et al.  Choosing words in computer-generated weather forecasts , 2005, Artif. Intell..

[21]  Octavian-Eugen Ganea,et al.  Neural Multi-step Reasoning for Question Answering on Semi-structured Tables , 2017, ECIR.

[22]  Zhifang Sui,et al.  Table-to-text Generation by Structure-aware Seq2seq Learning , 2017, AAAI.

[23]  David Grangier,et al.  Neural Text Generation from Structured Data with Application to the Biography Domain , 2016, EMNLP.

[24]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[25]  Zhifang Sui,et al.  Hierarchical Encoder with Auxiliary Supervision for Neural Table-to-Text Generation: Learning Better Representation for Tables , 2019, AAAI.

[26]  Scott Weinstein,et al.  Centering: A Framework for Modelling the Coherence of Discourse , 1994 .

[27]  Marc Dymetman,et al.  A surprisingly effective out-of-the-box char2char model on the E2E NLG Challenge dataset , 2017, SIGDIAL Conference.

[28]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[29]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[30]  Emiel Krahmer,et al.  Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..

[31]  Percy Liang,et al.  Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.

[32]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[33]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[34]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[35]  Rong Pan,et al.  Operation-guided Neural Networks for High Fidelity Data-To-Text Generation , 2018, EMNLP.

[36]  Mirella Lapata,et al.  Data-to-Text Generation with Content Selection and Planning , 2018, AAAI.

[37]  Bowen Zhou,et al.  Pointing the Unknown Words , 2016, ACL.

[38]  Alexander M. Rush,et al.  Challenges in Data-to-Document Generation , 2017, EMNLP.

[39]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.