Recursive Top-Down Production for Sentence Generation with Latent Trees

We model the recursive production property of context-free grammars for natural and synthetic languages. To this end, we present a dynamic programming algorithm that marginalises over latent binary tree structures with $N$ leaves, allowing us to compute the likelihood of a sequence of $N$ tokens under a latent tree model, which we maximise to train a recursive neural function. We demonstrate performance on two synthetic tasks: SCAN (Lake and Baroni, 2017), where it outperforms previous models on the LENGTH split, and English question formation (McCoy et al., 2020), where it performs comparably to decoders with the ground-truth tree structure. We also present experimental results on German-English translation on the Multi30k dataset (Elliott et al., 2016), and qualitatively analyse the induced tree structures our model learns for the SCAN tasks and the German-English translation task.

[1]  Michael I. Jordan,et al.  Tree-Structured Stick Breaking for Hierarchical Data , 2010, NIPS.

[2]  Yoshua Bengio,et al.  Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach , 2020, ACL.

[3]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[4]  Dan Klein,et al.  Natural Language Grammar Induction Using a Constituent-Context Model , 2001, NIPS.

[5]  Jakob Uszkoreit,et al.  Insertion Transformer: Flexible Sequence Generation via Insertion Operations , 2019, ICML.

[6]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[7]  Qi Liu,et al.  Insertion-based Decoding with Automatically Inferred Generation Order , 2019, Transactions of the Association for Computational Linguistics.

[8]  Kyunghyun Cho,et al.  Non-Monotonic Sequential Text Generation , 2019, ICML.

[9]  Liang Zhao,et al.  Compositional Generalization for Primitive Substitutions , 2019, EMNLP.

[10]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[11]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[12]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[13]  Yoshua Bengio,et al.  Compositional generalization in a deep seq2seq model by separating syntax and semantics , 2019, ArXiv.

[14]  Yoshua Bengio,et al.  Learning Hierarchical Structures On-The-Fly with a Recurrent-Recursive Model for Sequences , 2018, Rep4NLP@ACL.

[15]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[16]  Jason Weston,et al.  Jump to better conclusions: SCAN both left and right , 2018, BlackboxNLP@EMNLP.

[17]  Eiichiro Sumita,et al.  The Infinite Markov Model , 2007, NIPS.

[18]  R. Thomas McCoy,et al.  Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks , 2020, TACL.

[19]  L. Baum,et al.  Growth transformations for functions on manifolds. , 1968 .

[20]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[21]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[22]  Alexander M. Rush,et al.  Unsupervised Recurrent Neural Network Grammars , 2019, NAACL.

[23]  Liang Lu,et al.  Top-down Tree Long Short-Term Memory Networks , 2015, NAACL.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Aaron C. Courville,et al.  Neural Language Modeling by Jointly Learning Syntax and Lexicon , 2017, ICLR.

[26]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[27]  Anoop Sarkar,et al.  Top-down Tree Structured Decoding with Syntactic Connections for Neural Machine Translation and Parsing , 2018, EMNLP.

[28]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[29]  Jascha Sohl-Dickstein,et al.  REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[30]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[31]  Rens Bod,et al.  An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[32]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[33]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[34]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[35]  Kyunghyun Cho,et al.  Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior , 2019, AAAI.

[36]  L. Baum,et al.  An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology , 1967 .

[37]  Samuel R. Bowman,et al.  Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.

[38]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[39]  J. Baker Trainable grammars for speech recognition , 1979 .

[40]  Dawn Xiaodong Song,et al.  Tree-to-tree Neural Networks for Program Translation , 2018, NeurIPS.

[41]  Alan W. Black,et al.  Top-Down Structurally-Constrained Neural Response Generation with Lexicalized Probabilistic Context-Free Grammar , 2019, NAACL-HLT.

[42]  Dan Klein,et al.  Natural language grammar induction with a generative constituent-context model , 2005, Pattern Recognit..

[43]  Liang Lu,et al.  Tree Recurrent Neural Networks with Application to Language Modeling , 2015, ArXiv.

[44]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[45]  Alexander M. Rush,et al.  Sequence-Level Knowledge Distillation , 2016, EMNLP.

[46]  Tommi S. Jaakkola,et al.  Tree-structured decoding with doubly-recurrent neural networks , 2016, ICLR.

[47]  Aaron C. Courville,et al.  Ordered Memory , 2019, NeurIPS.

[48]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[49]  Alexander M. Rush,et al.  Compound Probabilistic Context-Free Grammars for Grammar Induction , 2019, ACL.

[50]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[51]  Yoshimasa Tsuruoka,et al.  Tree-to-Sequence Attentional Neural Machine Translation , 2016, ACL.