Latent Template Induction with Gumbel-CRFs

Learning to control the structure of sentences is a challenging problem in text generation. Existing work either relies on simple deterministic approaches or RL-based hard structures. We explore the use of structured variational autoencoders to infer latent templates for sentence generation using a soft, continuous relaxation in order to utilize reparameterization for training. Specifically, we propose a Gumbel-CRF, a continuous relaxation of the CRF sampling algorithm using a relaxed Forward-Filtering Backward-Sampling (FFBS) approach. As a reparameterized gradient estimator, the Gumbel-CRF gives more stable gradients than score-function based estimators. As a structured inference network, we show that it learns interpretable templates during training, which allows us to control the decoder during testing. We demonstrate the effectiveness of our methods with experiments on data-to-text generation and unsupervised paraphrase generation.

[1]  Max Welling,et al.  Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement , 2019, ICML.

[2]  Hong Sun,et al.  Joint Learning of a Dual SMT System for Paraphrase Generation , 2012, ACL.

[3]  U. Qidwai,et al.  Ubiquitous Arabic voice control device to assist people with disabilities , 2012, 2012 4th International Conference on Intelligent and Advanced Systems (ICIAS2012).

[4]  Yishu Miao Deep generative models for natural language processing , 2017 .

[5]  Mirella Lapata,et al.  Data-to-text Generation with Entity Modeling , 2019, ACL.

[6]  Lei Li,et al.  Rethinking Text Attribute Transfer: A Lexical Analysis , 2019, INLG.

[7]  Yansong Feng,et al.  Paraphrase Generation with Latent Bag of Words , 2020, NeurIPS.

[8]  Cícero Nogueira dos Santos,et al.  Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer , 2018, ACL.

[9]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[10]  Percy Liang,et al.  Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[11]  Alexander M. Rush,et al.  Learning Neural Templates for Text Generation , 2018, EMNLP.

[12]  Stefano Ermon,et al.  Differentiable Subset Sampling , 2019, ArXiv.

[13]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[14]  Arthur Mensch,et al.  Differentiable Dynamic Programming for Structured Prediction and Attention , 2018, ICML.

[15]  Yansong Feng,et al.  Natural Answer Generation with Heterogeneous Memory , 2018, NAACL.

[16]  Alexander M. Rush,et al.  A Tutorial on Deep Latent Variable Models of Natural Language , 2018, ArXiv.

[17]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[20]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[21]  Astrid Weber,et al.  What can I say?: addressing user experience challenges of a mobile voice user interface for accessibility , 2016, MobileHCI.

[22]  Gideon S. Mann,et al.  Efficient Computation of Entropy Gradient for Semi-Supervised Conditional Random Fields , 2007, NAACL.

[23]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[24]  D. Lazer,et al.  Fake news on Twitter during the 2016 U.S. presidential election , 2019, Science.

[25]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[26]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[27]  Scott W. Linderman,et al.  Reparameterizing the Birkhoff Polytope for Variational Permutation Inference , 2017, AISTATS.

[28]  Alexander M. Rush,et al.  Unsupervised Recurrent Neural Network Grammars , 2019, NAACL.

[29]  Michael Figurnov,et al.  Monte Carlo Gradient Estimation in Machine Learning , 2019, J. Mach. Learn. Res..

[30]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[31]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[32]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[33]  Ivan Titov,et al.  Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder , 2018, ICLR.

[34]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[35]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[36]  Graham Neubig,et al.  StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing , 2018, ACL.

[37]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[38]  Lei Li,et al.  CGMH: Constrained Sentence Generation by Metropolis-Hastings Sampling , 2018, AAAI.

[39]  Yang Liu,et al.  Dependency Grammar Induction with a Neural Variational Transition-based Parser , 2018, AAAI.

[40]  Arsénio Reis,et al.  Using intelligent personal assistants to assist the elderlies An evaluation of Amazon Alexa, Google Assistant, Microsoft Cortana, and Apple Siri , 2018, 2018 2nd International Conference on Technology and Innovation in Sports, Health and Wellbeing (TISHW).

[41]  George Papandreou,et al.  Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[42]  Ondrej Dusek,et al.  Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings , 2016, ACL.

[43]  Jascha Sohl-Dickstein,et al.  REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[44]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[45]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[46]  Jie Zhou,et al.  Unsupervised Paraphrasing by Simulated Annealing , 2019, ACL.

[47]  Jean Carletta,et al.  Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , 2005, ACL 2005.

[48]  Alexander M. Rush,et al.  Posterior Control of Blackbox Generation , 2020, ACL.

[49]  David Duvenaud,et al.  Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.

[50]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[51]  Carl Doersch,et al.  Tutorial on Variational Autoencoders , 2016, ArXiv.

[52]  Iryna Gurevych,et al.  E2E NLG Challenge: Neural Models vs. Templates , 2018, INLG.

[53]  Margot Brereton,et al.  Use of voice activated interfaces by people with intellectual disability , 2018, OZCHI.

[54]  Lei Li,et al.  Generating Sentences from Disentangled Syntactic and Semantic Spaces , 2019, ACL.

[55]  Eduard Hovy,et al.  Manual and automatic evaluation of summaries , 2002, ACL 2002.

[56]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[57]  Regina Barzilay,et al.  Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[58]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[59]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[60]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[61]  Alexander M. Rush,et al.  Avoiding Latent Variable Collapse With Generative Skip Models , 2018, AISTATS.

[62]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[63]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Daniel Tarlow,et al.  Gradient Estimation with Stochastic Softmax Tricks , 2020, NeurIPS.

[65]  Anja Belz,et al.  Comparing Automatic and Human Evaluation of NLG Systems , 2006, EACL.