A Tutorial on Deep Latent Variable Models of Natural Language

There has been much recent, exciting work on combining the complementary strengths of latent variable models and deep learning. Latent variable modeling makes it easy to explicitly specify model constraints through conditional independence properties, while deep learning makes it possible to parameterize these conditional likelihoods with powerful function approximators. While these "deep latent variable" models provide a rich, flexible framework for modeling many real-world phenomena, difficulties exist: deep parameterizations of conditional likelihoods usually make posterior inference intractable, and latent variable objectives often complicate backpropagation by introducing points of non-differentiability. This tutorial explores these issues in depth through the lens of variational inference.

[1]  Padhraic Smyth,et al.  Stick-Breaking Variational Autoencoders , 2016, ICLR.

[2]  Chris Dyer,et al.  Unsupervised POS Induction with Word Embeddings , 2015, NAACL.

[3]  Jason Lee,et al.  Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.

[4]  Nicola De Cao,et al.  Hyperspherical Variational Auto-Encoders , 2018, UAI 2018.

[5]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[6]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[7]  Kewei Tu,et al.  Unsupervised Neural Dependency Parsing , 2016, EMNLP.

[8]  Lior Wolf,et al.  Language Generation with Recurrent Generative Adversarial Networks without Pre-training , 2017, ArXiv.

[9]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[10]  Alexander M. Rush,et al.  Latent Alignment and Variational Attention , 2018, NeurIPS.

[11]  P. Glynn LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .

[12]  Christian P. Robert,et al.  On parameter estimation with the Wasserstein distance , 2017, Information and Inference: A Journal of the IMA.

[13]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[14]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[15]  Christof Monz,et al.  Ensemble Learning for Multi-Source Neural Machine Translation , 2016, COLING.

[16]  Mark Johnson,et al.  A Bayesian LDA-based model for semi-supervised part-of-speech tagging , 2007, NIPS.

[17]  Jascha Sohl-Dickstein,et al.  REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[18]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[19]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[20]  Scott W. Linderman,et al.  Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms , 2016, AISTATS.

[21]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[22]  Gunhee Kim,et al.  A Hierarchical Latent Structure for Variational Conversation Modeling , 2018, NAACL.

[23]  Yoshua Bengio,et al.  Z-Forcing: Training Stochastic Recurrent Networks , 2017, NIPS.

[24]  Alexander M. Rush,et al.  Learning Neural Templates for Text Generation , 2018, EMNLP.

[25]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[26]  Alexander A. Alemi,et al.  Fixing a Broken ELBO , 2017, ICML.

[27]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[28]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[29]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[30]  Aurko Roy,et al.  Fast Decoding in Sequence Models using Discrete Latent Variables , 2018, ICML.

[31]  Lei Yu,et al.  The Neural Noisy Channel , 2016, ICLR.

[32]  Mohammad Havaei,et al.  Learnable Explicit Density for Continuous Latent Space and Variational Inference , 2017, ArXiv.

[33]  Jihun Choi,et al.  Learning to Compose Task-Specific Tree Structures , 2017, AAAI.

[34]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[35]  Yee Whye Teh,et al.  Tighter Variational Bounds are Not Necessarily Better , 2018, ICML.

[36]  Lei Yu,et al.  Online Segment to Segment Neural Transduction , 2016, EMNLP.

[37]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[38]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[39]  Steven Skiena,et al.  Syntax-Directed Variational Autoencoder for Structured Data , 2018, ICLR.

[40]  Wasserstein Generative Adversarial Networks A . Why Wasserstein is indeed weak , .

[41]  Ryan Cotterell,et al.  Hard Non-Monotonic Attention for Character-Level Transduction , 2018, EMNLP.

[42]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[45]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[46]  Jason Tyler Rolfe,et al.  Discrete Variational Autoencoders , 2016, ICLR.

[47]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[48]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[49]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[50]  George Papandreou,et al.  Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[51]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[52]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[53]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[54]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[55]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[56]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[57]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[58]  Hedvig Kjellström,et al.  Advances in Variational Inference , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Zhen Xu,et al.  Neural Response Generation via GAN with an Approximate Embedding Layer , 2017, EMNLP.

[60]  Jean-Michel Renders,et al.  LSTM-Based Mixture-of-Experts for Knowledge-Aware Dialogues , 2016, Rep4NLP@ACL.

[61]  Peter W. Glynn,et al.  Likelilood ratio gradient estimation: an overview , 1987, WSC '87.

[62]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[63]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[64]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[65]  Yoshua Bengio,et al.  Reweighted Wake-Sleep , 2014, ICLR.

[66]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[67]  Xin Jiang,et al.  Neural Generative Question Answering , 2015, IJCAI.

[68]  Valentin I. Spitkovsky,et al.  Unsupervised Dependency Parsing without Gold Part-of-Speech Tags , 2011, EMNLP.

[69]  David Duvenaud,et al.  Reinterpreting Importance-Weighted Autoencoders , 2017, ICLR.

[70]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[71]  Peng Wang,et al.  Short Text Clustering via Convolutional Neural Networks , 2015, VS@HLT-NAACL.

[72]  Joelle Pineau,et al.  Language GANs Falling Short , 2018, ICLR.

[73]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[74]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[75]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[76]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[77]  Zhe Gan,et al.  Topic Compositional Neural Language Model , 2017, AISTATS.

[78]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[79]  Stefano Ermon,et al.  Learning Hierarchical Features from Generative Models , 2017, ArXiv.

[80]  Erhardt Barth,et al.  A Hybrid Convolutional Variational Autoencoder for Text Generation , 2017, EMNLP.

[81]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[82]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[83]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[84]  Eric P. Xing,et al.  Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[85]  Shakir Mohamed,et al.  Learning in Implicit Generative Models , 2016, ArXiv.

[86]  Alexander D'Amour,et al.  Reducing Reparameterization Gradient Variance , 2017, NIPS.

[87]  Jeffrey Ling,et al.  Structured Variational Autoencoders for the Beta-Bernoulli Process , 2017 .

[88]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[89]  Wei Chen,et al.  Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets , 2017, NAACL.

[90]  Max Welling,et al.  Sylvester Normalizing Flows for Variational Inference , 2018, UAI.

[91]  Tuan Anh Le,et al.  Auto-Encoding Sequential Monte Carlo , 2017, ICLR.

[92]  Noah A. Smith,et al.  Segmental Recurrent Neural Networks , 2015, ICLR.

[93]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[94]  Iryna Gurevych,et al.  Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations , 2018, 1803.01400.

[95]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[96]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[97]  Alexander M. Rush,et al.  Adversarially Regularized Autoencoders , 2017, ICML.

[98]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[99]  Noah D. Goodman,et al.  Pyro: Deep Universal Probabilistic Programming , 2018, J. Mach. Learn. Res..

[100]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[101]  Ramón Fernández Astudillo,et al.  From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[102]  Alexander J. Smola,et al.  Latent LSTM Allocation: Joint Clustering and Non-Linear Dynamic Modeling of Sequence Data , 2017, ICML.

[103]  Xiaodong Gu,et al.  DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder , 2018, ICLR.

[104]  Kewei Tu,et al.  Dependency Grammar Induction with Neural Lexicalization and Big Training Data , 2017, EMNLP.

[105]  Yee Whye Teh,et al.  Revisiting Reweighted Wake-Sleep , 2018, ArXiv.

[106]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[107]  Valentin I. Spitkovsky,et al.  Viterbi Training Improves Unsupervised Dependency Parsing , 2010, CoNLL.

[108]  David M. Blei,et al.  Syntactic Topic Models , 2008, NIPS.

[109]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[110]  Jakub M. Tomczak,et al.  UvA-DARE ( Digital Academic Repository ) Improving Variational Auto-Encoders using Householder Flow , 2016 .

[111]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[112]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[113]  Scott W. Linderman,et al.  Variational Sequential Monte Carlo , 2017, AISTATS.

[114]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[115]  Alexandre Lacoste,et al.  Neural Autoregressive Flows , 2018, ICML.

[116]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[117]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[118]  Zenglin Xu,et al.  Structured Inference for Recurrent Hidden Semi-markov Model , 2018, IJCAI.

[119]  Matthew D. Hoffman,et al.  On the challenges of learning with inference networks on sparse, high-dimensional data , 2017, AISTATS.

[120]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[121]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[122]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[123]  Ivan Titov,et al.  Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder , 2018, ICLR.

[124]  Bernhard Schölkopf,et al.  Wasserstein Auto-Encoders , 2017, ICLR.

[125]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[126]  Jiacheng Xu,et al.  Spherical Latent Spaces for Stable Variational Autoencoders , 2018, EMNLP.

[127]  Tom Minka,et al.  A* Sampling , 2014, NIPS.

[128]  Alexander M. Rush,et al.  Avoiding Latent Variable Collapse With Generative Skip Models , 2018, AISTATS.

[129]  David Vázquez,et al.  PixelVAE: A Latent Variable Model for Natural Images , 2016, ICLR.

[130]  Percy Liang,et al.  Generating Sentences by Editing Prototypes , 2017, TACL.

[131]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[132]  C. Robert,et al.  Inference in generative models using the Wasserstein distance , 2017, 1701.05146.

[133]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[134]  Claire Cardie,et al.  SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.

[135]  Max Welling,et al.  Variational Graph Auto-Encoders , 2016, ArXiv.

[136]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[137]  Aurko Roy,et al.  Theory and Experiments on Vector Quantized Autoencoders , 2018, ArXiv.

[138]  Stefano Ermon,et al.  InfoVAE: Balancing Learning and Inference in Variational Autoencoders , 2019, AAAI.

[139]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[140]  Karl Stratos,et al.  Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models , 2016, TACL.

[141]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[142]  Matthew D. Hoffman,et al.  Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo , 2017, ICML.

[143]  Mark Johnson,et al.  Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing , 2009, NAACL.

[144]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[145]  Sebastian Gehrmann,et al.  End-to-End Content and Plan Selection for Natural Language Generation , 2018 .

[146]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[147]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[148]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[149]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[150]  Ritabrata Dutta,et al.  Likelihood-free inference via classification , 2014, Stat. Comput..

[151]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[152]  Marc'Aurelio Ranzato,et al.  Learning Factored Representations in a Deep Mixture of Experts , 2013, ICLR.

[153]  Kewei Tu,et al.  CRF Autoencoder for Unsupervised Dependency Parsing , 2017, EMNLP.

[154]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[155]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[156]  George Tucker,et al.  Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives , 2019, ICLR.

[157]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[158]  Daniel Marcu,et al.  Unsupervised Neural Hidden Markov Models , 2016, SPNLP@EMNLP.

[159]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[160]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[161]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[162]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[163]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[164]  Justin Domke,et al.  Importance Weighting and Variational Inference , 2018, NeurIPS.

[165]  M. Gutmann,et al.  Statistical Inference of Intractable Generative Models via Classification , 2014 .

[166]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[167]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[168]  Yaoliang Yu,et al.  Dropout with Expectation-linear Regularization , 2016, ICLR.

[169]  Jason Eisner,et al.  Inside-Outside and Forward-Backward Algorithms Are Just Backprop (tutorial paper) , 2016, SPNLP@EMNLP.

[170]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[171]  Christopher Joseph Pal,et al.  Towards Text Generation with Adversarially Learned Neural Outlines , 2018, NeurIPS.

[172]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[173]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[174]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[175]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[176]  Enrique Alfonseca,et al.  Eval all, trust a few, do wrong to none: Comparing sentence generation models , 2018, ArXiv.

[177]  Yong Yu,et al.  Long Text Generation via Adversarial Training with Leaked Information , 2017, AAAI.

[178]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[179]  Rob Brekelmans,et al.  Auto-Encoding Total Correlation Explanation , 2018, AISTATS.

[180]  Ruslan Salakhutdinov,et al.  On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[181]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[182]  Victor O. K. Li,et al.  Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[183]  Michael Cogswell,et al.  Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles , 2016, NIPS.

[184]  Mark Steedman,et al.  Two Decades of Unsupervised POS Induction: How Far Have We Come? , 2010, EMNLP.

[185]  Chong Wang,et al.  TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency , 2016, ICLR.

[186]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[187]  Shakir Mohamed,et al.  Implicit Reparameterization Gradients , 2018, NeurIPS.

[188]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[189]  David Duvenaud,et al.  Inference Suboptimality in Variational Autoencoders , 2018, ICML.

[190]  Chris Dyer,et al.  Learning to Discover, Ground and Use Words with Segmental Neural Language Models , 2018, ACL.

[191]  David Duvenaud,et al.  Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.

[192]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[193]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[194]  Alexander M. Rush,et al.  Semi-Amortized Variational Autoencoders , 2018, ICML.

[195]  Nebojsa Jojic,et al.  Iterative Refinement of the Approximate Posterior for Directed Belief Networks , 2015, NIPS.

[196]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[197]  David M. Blei,et al.  The Generalized Reparameterization Gradient , 2016, NIPS.

[198]  Yoshua Bengio,et al.  Professor Forcing: A New Algorithm for Training Recurrent Networks , 2016, NIPS.

[199]  Yee Whye Teh,et al.  Filtering Variational Objectives , 2017, NIPS.

[200]  Jonathan Berant,et al.  Evaluating Text GANs as Language Models , 2018, NAACL.

[201]  Yang Liu,et al.  Dependency Grammar Induction with a Neural Variational Transition-based Parser , 2018, AAAI.

[202]  Tie-Yan Liu,et al.  Adversarial Neural Machine Translation , 2017, ACML.

[203]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[204]  Carl Henrik Ek,et al.  Nonparametric Inference for Auto-Encoding Variational Bayes , 2017, NIPS 2017.

[205]  Brendan J. Frey,et al.  Learning Wake-Sleep Recurrent Attention Models , 2015, NIPS.

[206]  Tommi S. Jaakkola,et al.  On the Partition Function and Random Maximum A-Posteriori Perturbations , 2012, ICML.

[207]  Stanislau Semeniuta,et al.  On Accurate Evaluation of GANs for Language Generation , 2018, ArXiv.

[208]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[209]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[210]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[211]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[212]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[213]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[214]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[215]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[216]  Sunita Sarawagi,et al.  Surprisingly Easy Hard-Attention for Sequence to Sequence Learning , 2018, EMNLP.

[217]  Chris Dyer,et al.  Pushing the bounds of dropout , 2018, ArXiv.

[218]  Juha Karhunen,et al.  A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines , 2013, ICANN.

[219]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[220]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[221]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[222]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[223]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[224]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[225]  Matthias Bethge,et al.  A note on the evaluation of generative models , 2015, ICLR.

[226]  Graham Neubig,et al.  StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing , 2018, ACL.

[227]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[228]  Gholamreza Haffari,et al.  Sequence to Sequence Mixture Model for Diverse Machine Translation , 2018, CoNLL.

[229]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[230]  Yisong Yue,et al.  Iterative Amortized Inference , 2018, ICML.

[231]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[232]  Chris Dyer,et al.  Unsupervised Word Discovery with Segmental Neural Language Models , 2018, ArXiv.

[233]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[234]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[235]  Sandeep Subramanian,et al.  Adversarial Generation of Natural Language , 2017, Rep4NLP@ACL.

[236]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[237]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[238]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[239]  Kevin Lin,et al.  Adversarial Ranking for Language Generation , 2017, NIPS.

[240]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[241]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[242]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[243]  Graham Neubig,et al.  Unsupervised Learning of Syntactic Structure with Invertible Neural Projections , 2018, EMNLP.

[244]  Phil Blunsom,et al.  A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction , 2011, ACL.

[245]  Phil Blunsom,et al.  Discovering Discrete Latent Topics with Neural Variational Inference , 2017, ICML.

[246]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[247]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[248]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[249]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[250]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[251]  Paul Glasserman,et al.  Monte Carlo Methods in Financial Engineering , 2003 .

[252]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[253]  Lawrence Carin,et al.  Deconvolutional Latent-Variable Model for Text Sequence Matching , 2017, AAAI.

[254]  Phil Blunsom,et al.  Language as a Latent Variable: Discrete Generative Models for Sentence Compression , 2016, EMNLP.

[255]  Zhe Gan,et al.  VAE Learning via Stein Variational Gradient Descent , 2017, NIPS.