论文信息 - Generating Text from Structured Data with Application to the Biography Domain

Generating Text from Structured Data with Application to the Biography Domain

This paper introduces a neural model for concept-to-text generation that scales to large, rich domains. We experiment with a new dataset of biographies from Wikipedia that is an order of magni- tude larger than existing resources with over 700k samples. The dataset is also vastly more diverse with a 400k vocab- ulary, compared to a few hundred words for Weathergov or Robocup. Our model builds upon recent work on conditional neural language model for text genera- tion. To deal with the large vocabulary, we extend these models to mix a fixed vocabulary with copy actions that trans- fer sample-specific words from the in- put database to the generated output sen- tence. Our neural model significantly out- performs a classical Kneser-Ney language model adapted to this task by nearly 15 BLEU.

[1] Sabine Geldof,et al. CORAL: using natural language generation for navigational assistance , 2003 .

[2] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[3] Dan Klein,et al. Learning Semantic Correspondences with Less Supervision , 2009, ACL.

[4] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[5] Volodymyr Kindratenko,et al. Numerical Computations with GPUs , 2014, Springer International Publishing.

[6] Hwee Tou Ng,et al. A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions , 2011, EMNLP.

[7] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[8] David A. Ferrucci,et al. Introduction to "This is Watson" , 2012, IBM J. Res. Dev..

[9] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[10] Raymond J. Mooney,et al. Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation , 2007, NAACL.

[11] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[12] Hang Li,et al. Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[13] Nancy Green,et al. Generation of Biomedical Arguments for Lay Readers , 2006, INLG.

[14] Ehud Reiter,et al. Generating Approximate Geographic Descriptions , 2009, ENLG.

[15] Claire Gardent,et al. Surface Realisation from Knowledge-Bases , 2014, ACL.

[16] Ehud Reiter,et al. Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[17] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[19] Matthew R. Walter,et al. What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment , 2015, NAACL.

[20] Praveen Paritosh,et al. Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[21] Anja Belz,et al. Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models , 2008, Natural Language Engineering.

[22] Mirella Lapata,et al. Collective Content Selection for Concept-to-Text Generation , 2005, HLT.

[23] Adwait Ratnaparkhi,et al. Trainable approaches to surface natural language generation and their application to conversational dialog systems , 2002, Comput. Speech Lang..

[24] Ion Androutsopoulos,et al. Generating Multilingual Descriptions from Linguistically Annotated OWL Ontologies: the NaturalOWL System , 2007, ENLG.

[25] Mirella Lapata,et al. A Global Model for Concept-to-Text Generation , 2013, J. Artif. Intell. Res..

[26] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Jim Hunter,et al. Choosing words in computer-generated weather forecasts , 2005, Artif. Intell..

[28] Mirella Lapata,et al. Aggregation via Set Partitioning for Natural Language Generation , 2006, NAACL.

[29] Kathleen McKeown,et al. Content Planner Construction via Evolutionary Algorithms and a Corpus-based Fitness Function , 2002, INLG.

[30] Raymond J. Mooney,et al. Generative Alignment and Semantic Parsing for Learning from Ambiguous Supervision , 2010, COLING.

[31] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] David Vandyke,et al. Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems , 2015, EMNLP.

[33] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Ronan Collobert,et al. Word Embeddings through Hellinger PCA , 2013, EACL.

[35] Geoffrey Zweig,et al. Attention with Intention for a Neural Network Conversation Model , 2015, ArXiv.

[36] Dan Klein,et al. A Simple Domain-Independent Probabilistic Approach to Generation , 2010, EMNLP.

[37] Quoc V. Le,et al. Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[38] Philipp Koehn,et al. Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[39] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..