Deep Latent Variable Models of Natural Language

The proposed tutorial will cover deep latent variable models both in the case where exact inference over the latent variables is tractable and when it is not. The former case includes neural extensions of unsupervised tagging and parsing models. Our discussion of the latter case, where inference cannot be performed tractably, will restrict itself to continuous latent variables. In particular, we will discuss recent developments both in neural variational inference (e.g., relating to Variational Auto-encoders) and in implicit density modeling (e.g., relating to Generative Adversarial Networks). We will highlight the challenges of applying these families of methods to NLP problems, and discuss recent successes and best practices.

[1]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[2]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[3]  Tommi S. Jaakkola,et al.  Sequence to Better Sequence: Continuous Revision of Combinatorial Structures , 2017, ICML.

[4]  Dan Klein,et al.  A Minimal Span-Based Neural Constituency Parser , 2017, ACL.

[5]  Tommi S. Jaakkola,et al.  A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.

[6]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[7]  Tom Minka,et al.  A* Sampling , 2014, NIPS.

[8]  Alexander M. Rush,et al.  Avoiding Latent Variable Collapse With Generative Skip Models , 2018, AISTATS.

[9]  David Vázquez,et al.  PixelVAE: A Latent Variable Model for Natural Images , 2016, ICLR.

[10]  Yoav Goldberg,et al.  Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[11]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[12]  Alexander M. Rush,et al.  Latent Alignment and Variational Attention , 2018, NeurIPS.

[13]  Roman Novak,et al.  Iterative Refinement for Machine Translation , 2016, ArXiv.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[17]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[18]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[19]  Percy Liang,et al.  Generating Sentences by Editing Prototypes , 2017, TACL.

[20]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[21]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[22]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[23]  Alexander M. Rush,et al.  Image-to-Markup Generation with Coarse-to-Fine Attention , 2016, ICML.

[24]  Claire Cardie,et al.  SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.

[25]  Yiming Yang,et al.  A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text , 2019, EMNLP.

[26]  Kewei Tu,et al.  Structured Attentions for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Ahmad Emami,et al.  A Neural Syntactic Language Model , 2005, Machine Learning.

[28]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[29]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[30]  Barak A. Pearlmutter,et al.  Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[31]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[32]  Yang Liu,et al.  Learning Structured Text Representations , 2017, TACL.

[33]  Eric P. Xing,et al.  Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Ryan Cotterell,et al.  Hard Non-Monotonic Attention for Character-Level Transduction , 2018, EMNLP.

[35]  Armand Joulin,et al.  Cooperative Learning of Disjoint Syntax and Semantics , 2019, NAACL.

[36]  Alfred V. Aho,et al.  Indexed Grammars—An Extension of Context-Free Grammars , 1967, SWAT.

[37]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[38]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[39]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[40]  Samuel R. Bowman,et al.  Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.

[41]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[42]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[43]  Gourab Kundu,et al.  On Amortizing Inference Cost for Structured Prediction , 2012, EMNLP.

[44]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[45]  Mark Johnson,et al.  Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing , 2009, NAACL.

[46]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[47]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[48]  Brendan J. Frey,et al.  Learning Wake-Sleep Recurrent Attention Models , 2015, NIPS.

[49]  Tommi S. Jaakkola,et al.  On the Partition Function and Random Maximum A-Posteriori Perturbations , 2012, ICML.

[50]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[51]  Yang Liu,et al.  Structured Alignment Networks for Matching Sentences , 2018, EMNLP.

[52]  Jihun Choi,et al.  Learning to Compose Task-Specific Tree Structures , 2017, AAAI.

[53]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[54]  Tal Linzen,et al.  Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.

[55]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[56]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[57]  Yoshua Bengio,et al.  Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes , 2016, ArXiv.

[58]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[59]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[60]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[61]  Baobao Chang,et al.  Graph-based Dependency Parsing with Bidirectional LSTM , 2016, ACL.

[62]  Graham Neubig,et al.  On-the-fly Operation Batching in Dynamic Computation Graphs , 2017, NIPS.

[63]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[64]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[65]  Dan Klein,et al.  Constituency Parsing with a Self-Attentive Encoder , 2018, ACL.

[66]  Yisong Yue,et al.  Iterative Amortized Inference , 2018, ICML.

[67]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[68]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[69]  Chris Dyer,et al.  Unsupervised Word Discovery with Segmental Neural Language Models , 2018, ArXiv.

[70]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[71]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[72]  Valentin I. Spitkovsky,et al.  Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction , 2013, EMNLP.

[73]  Luke S. Zettlemoyer,et al.  Deep Semantic Role Labeling: What Works and What’s Next , 2017, ACL.

[74]  George Papandreou,et al.  Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[75]  Ole Winther,et al.  Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[76]  Aravind K. Joshi,et al.  Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[77]  Jörg Bornschein,et al.  Variational Memory Addressing in Generative Models , 2017, NIPS.

[78]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[79]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[80]  Frank Keller,et al.  An Imitation Learning Approach to Unsupervised Parsing , 2019, ACL.

[81]  P. Glynn LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .

[82]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[83]  Christof Monz,et al.  Ensemble Learning for Multi-Source Neural Machine Translation , 2016, COLING.

[84]  Ari Rappoport,et al.  Improved Fully Unsupervised Parsing with Zoomed Learning , 2010, EMNLP.

[85]  Mathias Creutz,et al.  Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[86]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[87]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[88]  James Jay Horning,et al.  A study of grammatical inference , 1969 .

[89]  Phil Blunsom,et al.  Neural Syntactic Generative Models with Exact Marginalization , 2018, NAACL.

[90]  Pieter Abbeel,et al.  Variational Lossy Autoencoder , 2016, ICLR.

[91]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[92]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[93]  Noah A. Smith,et al.  Backpropagating through Structured Argmax using a SPIGOT , 2018, ACL.

[94]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[95]  Carl Jesse Pollard,et al.  Generalized phrase structure grammars, head grammars, and natural language , 1984 .

[96]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[97]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[98]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[99]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[100]  Alexander M. Rush,et al.  Structured Attention Networks , 2017, ICLR.

[101]  Eric P. Xing,et al.  Spectral Unsupervised Parsing with Additive Tree Metrics , 2014, ACL.

[102]  Yonatan Bisk,et al.  Inducing Grammars with and for Neural Machine Translation , 2018, NMT@ACL.

[103]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[104]  Bowen Zhou,et al.  Pointing the Unknown Words , 2016, ACL.

[105]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[106]  J. Baker Trainable grammars for speech recognition , 1979 .

[107]  Lawrence Carin,et al.  Deconvolutional Latent-Variable Model for Text Sequence Matching , 2017, AAAI.

[108]  Phil Blunsom,et al.  Language as a Latent Variable: Discrete Generative Models for Sentence Compression , 2016, EMNLP.

[109]  Joshua Goodman,et al.  Parsing Inside-Out , 1998, ArXiv.

[110]  Sanjeev Arora,et al.  On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[111]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[112]  David J. Weir,et al.  The equivalence of four extensions of context-free grammars , 1994, Mathematical systems theory.

[113]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[114]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[115]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[116]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[117]  Lifu Tu,et al.  Learning Approximate Inference Networks for Structured Prediction , 2018, ICLR.

[118]  Graham Neubig,et al.  StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing , 2018, ACL.

[119]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[120]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[121]  Barnabás Póczos,et al.  Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[122]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[123]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[124]  Phil Blunsom,et al.  Collapsed Variational Bayesian Inference for PCFGs , 2013, CoNLL.

[125]  Noah A. Smith,et al.  Annealing Techniques For Unsupervised Statistical Language Learning , 2004, ACL.

[126]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[127]  Samuel R. Bowman,et al.  Grammar Induction with Neural Language Models: An Unusual Replication , 2018, EMNLP.

[128]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[129]  Kewei Tu,et al.  CRF Autoencoder for Unsupervised Dependency Parsing , 2017, EMNLP.

[130]  Yoshua Bengio,et al.  Z-Forcing: Training Stochastic Recurrent Networks , 2017, NIPS.

[131]  Wilker Aziz,et al.  A Stochastic Decoder for Neural Machine Translation , 2018, ACL.

[132]  Zhe Gan,et al.  VAE Learning via Stein Variational Gradient Descent , 2017, NIPS.

[133]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[134]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[135]  Graham Neubig,et al.  Unsupervised Learning of Syntactic Structure with Invertible Neural Projections , 2018, EMNLP.

[136]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[137]  Phil Blunsom,et al.  Discovering Discrete Latent Topics with Neural Variational Inference , 2017, ICML.

[138]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[139]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[140]  Daniel Marcu,et al.  Unsupervised Neural Hidden Markov Models , 2016, SPNLP@EMNLP.

[141]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[142]  Pascal Poupart,et al.  Variational Attention for Sequence-to-Sequence Models , 2017, COLING.

[143]  Claire Cardie,et al.  Towards Dynamic Computation Graphs via Sparse Latent Structure , 2018, EMNLP.

[144]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[145]  Noah A. Smith,et al.  Is Attention Interpretable? , 2019, ACL.

[146]  Kevin Gimpel,et al.  A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations , 2019, NAACL.

[147]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[148]  Kevin Gimpel,et al.  Visually Grounded Neural Syntax Acquisition , 2019, ACL.

[149]  Dan Klein,et al.  Neural CRF Parsing , 2015, ACL.

[150]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[151]  Juha Karhunen,et al.  How to Pretrain Deep Boltzmann Machines in Two Stages , 2015 .

[152]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[153]  Stephen Clark,et al.  Scalable Syntax-Aware Language Models Using Knowledge Distillation , 2019, ACL.

[154]  Colin Raffel,et al.  Learning Hard Alignments with Variational Inference , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[155]  Alexander M. Rush,et al.  Learning Neural Templates for Text Generation , 2018, EMNLP.

[156]  Rebecca Hwa Supervised Grammar Induction using Training Data with Limited Constituent Information , 1999, ACL.

[157]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[158]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[159]  Wang Ling,et al.  Learning to Compose Words into Sentences with Reinforcement Learning , 2016, ICLR.

[160]  Rico Sennrich,et al.  Proceedings of the Second Conference on Machine Translation, Volume 1: Research Papers , 2017 .

[161]  Arian Maleki,et al.  Benefits of over-parameterization with EM , 2018, NeurIPS.

[162]  Noah A. Smith,et al.  Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction , 2008, NIPS.

[163]  Zhifei Li,et al.  First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests , 2009, EMNLP.

[164]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[165]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[166]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[167]  Andrew Y. Ng,et al.  Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.

[168]  Yaoliang Yu,et al.  Dropout with Expectation-linear Regularization , 2016, ICLR.

[169]  Hongyu Guo,et al.  Long Short-Term Memory Over Tree Structures , 2015, ArXiv.

[170]  Andrew McCallum,et al.  End-to-End Learning for Structured Prediction Energy Networks , 2017, ICML.

[171]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[172]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[173]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[174]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[175]  Lei Li,et al.  On Tree-Based Neural Sentence Modeling , 2018, EMNLP.

[176]  Rens Bod,et al.  An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.

[177]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[178]  Jason Eisner,et al.  Inside-Outside and Forward-Backward Algorithms Are Just Backprop (tutorial paper) , 2016, SPNLP@EMNLP.

[179]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[180]  Alexander Yates,et al.  Factorial Hidden Markov Models for Learning Representations of Natural Language , 2013, ICLR.

[181]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[182]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[183]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[184]  Kevin Gimpel,et al.  Controllable Paraphrase Generation with a Syntactic Exemplar , 2019, ACL.

[185]  Yang Liu,et al.  Dependency Grammar Induction with a Neural Variational Transition-based Parser , 2018, AAAI.

[186]  Jiacheng Xu,et al.  Spherical Latent Spaces for Stable Variational Autoencoders , 2018, EMNLP.

[187]  Anoop Sarkar,et al.  Top-down Tree Structured Decoding with Syntactic Connections for Neural Machine Translation and Parsing , 2018, EMNLP.

[188]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[189]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[190]  Dan Klein,et al.  Abstract Syntax Networks for Code Generation and Semantic Parsing , 2017, ACL.

[191]  Mark Johnson,et al.  Using Left-corner Parsing to Encode Universal Structural Constraints in Grammar Induction , 2016, EMNLP.

[192]  Rebecca Hwa,et al.  Sample Selection for Statistical Grammar Induction , 2000, EMNLP.

[193]  Noah A. Smith,et al.  What Do Recurrent Neural Network Grammars Learn About Syntax? , 2016, EACL.

[194]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[195]  Graham Neubig,et al.  A Tree-based Decoder for Neural Machine Translation , 2018, EMNLP.

[196]  Stephen Clark,et al.  Latent Tree Learning with Differentiable Parsers: Shift-Reduce Parsing and Chart Parsing , 2018, ArXiv.

[197]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[198]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[199]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[200]  Michael Cogswell,et al.  Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles , 2016, NIPS.

[201]  Alexander M. Rush,et al.  Unsupervised Recurrent Neural Network Grammars , 2019, NAACL.

[202]  Chong Wang,et al.  TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency , 2016, ICLR.

[203]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[204]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[205]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[206]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[207]  John Hale,et al.  Finding syntax in human encephalography with beam search , 2018, ACL.

[208]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[209]  Noah A. Smith,et al.  Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction , 2009, NAACL.

[210]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[211]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[212]  Sunita Sarawagi,et al.  Surprisingly Easy Hard-Attention for Sequence to Sequence Learning , 2018, EMNLP.

[213]  Stuart M. Shieber,et al.  Evidence against the context-freeness of natural language , 1985 .

[214]  Chris Dyer,et al.  Pushing the bounds of dropout , 2018, ArXiv.

[215]  Marcello Federico,et al.  Report on the 10th IWSLT evaluation campaign , 2013, IWSLT.

[216]  Juha Karhunen,et al.  A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines , 2013, ICANN.

[217]  Frank D. Wood,et al.  Inference Networks for Sequential Monte Carlo in Graphical Models , 2016, ICML.

[218]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[219]  Richard Socher,et al.  Towards Neural Machine Translation with Latent Tree Attention , 2017, SPNLP@EMNLP.

[220]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[221]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[222]  Phil Blunsom,et al.  Generative Incremental Dependency Parsing with Neural Networks , 2015, ACL.

[223]  Kenichi Kurihara,et al.  Variational Bayesian Grammar Induction for Natural Language , 2006, ICGI.

[224]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[225]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[226]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[227]  Aaron C. Courville,et al.  Neural Language Modeling by Jointly Learning Syntax and Lexicon , 2017, ICLR.

[228]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[229]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[230]  Mark Steedman,et al.  Combinatory grammars and parasitic gaps , 1987 .

[231]  Benjamin Schrauwen,et al.  Training energy-based models for time-series imputation , 2013, J. Mach. Learn. Res..

[232]  Tommi S. Jaakkola,et al.  Tree-structured decoding with doubly-recurrent neural networks , 2016, ICLR.

[233]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[234]  Ramón Fernández Astudillo,et al.  From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[235]  Lane Schwartz,et al.  Unsupervised Grammar Induction with Depth-bounded PCFG , 2018, TACL.

[236]  Xiaodong Liu,et al.  Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.

[237]  Justin Domke,et al.  Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[238]  Karl Stratos,et al.  Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction , 2018, NAACL.

[239]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[240]  Karl Stratos,et al.  Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[241]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[242]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[243]  Christopher D. Manning,et al.  Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.

[244]  R. Lee Humphreys,et al.  The linguistics of punctuation , 2004, Machine Translation.

[245]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[246]  Lane Schwartz,et al.  Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction , 2018, EMNLP.

[247]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[248]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[249]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[250]  Graham Neubig,et al.  Lagging Inference Networks and Posterior Collapse in Variational Autoencoders , 2019, ICLR.

[251]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[252]  Arto Klami,et al.  Importance Sampled Stochastic Optimization for Variational Inference , 2017, UAI.

[253]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[254]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[255]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[256]  Matthew D. Hoffman,et al.  On the challenges of learning with inference networks on sparse, high-dimensional data , 2017, AISTATS.

[257]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[258]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[259]  Ivan Titov,et al.  Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder , 2018, ICLR.

[260]  Shan Wu,et al.  Variational Recurrent Neural Machine Translation , 2018, AAAI.

[261]  Shuohang Wang,et al.  Machine Comprehension Using Match-LSTM and Answer Pointer , 2016, ICLR.

[262]  H. Robbins An Empirical Bayes Approach to Statistics , 1956 .

[263]  Zhe Gan,et al.  Topic Compositional Neural Language Model , 2017, AISTATS.

[264]  Erhardt Barth,et al.  A Hybrid Convolutional Variational Autoencoder for Text Generation , 2017, EMNLP.

[265]  Manaal Faruqui,et al.  Text Generation with Exemplar-based Adaptive Decoding , 2019, NAACL.

[266]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[267]  Min Zhang,et al.  Improved Constituent Context Model with Features , 2012, PACLIC.

[268]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.

[269]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[270]  John DeNero,et al.  A Feature-Rich Constituent Context Model for Grammar Induction , 2012, ACL.

[271]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[272]  Vlad Niculae,et al.  A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.

[273]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[274]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[275]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[276]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[277]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[278]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[279]  Regina Barzilay,et al.  Unsupervised Multilingual Grammar Induction , 2009, ACL.

[280]  Mohit Yadav,et al.  Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders , 2019, NAACL.

[281]  Alexander M. Rush,et al.  Semi-Amortized Variational Autoencoders , 2018, ICML.

[282]  Nebojsa Jojic,et al.  Iterative Refinement of the Approximate Posterior for Directed Belief Networks , 2015, NIPS.

[283]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[284]  Bonggun Shin,et al.  Classification of radiology reports using neural attention models , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[285]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[286]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[287]  Min Zhang,et al.  Variational Neural Machine Translation , 2016, EMNLP.

[288]  Alexander M. Rush,et al.  Coarse-to-Fine Attention Models for Document Summarization , 2017, NFiS@EMNLP.

[289]  Chris Dyer,et al.  Unsupervised POS Induction with Word Embeddings , 2015, NAACL.

[290]  Jason Lee,et al.  Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.

[291]  Michael Figurnov,et al.  Monte Carlo Gradient Estimation in Machine Learning , 2019, J. Mach. Learn. Res..

[292]  Sam Wiseman,et al.  Amortized Bethe Free Energy Minimization for Learning MRFs , 2019, NeurIPS.

[293]  Valentin I. Spitkovsky,et al.  Three Dependency-and-Boundary Models for Grammar Induction , 2012, EMNLP.

[294]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[295]  Richard Socher,et al.  Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[296]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[297]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[298]  Alexander M. Rush,et al.  Compound Probabilistic Context-Free Grammars for Grammar Induction , 2019, ACL.

[299]  Arthur Mensch,et al.  Differentiable Dynamic Programming for Structured Prediction and Attention , 2018, ICML.

[300]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[301]  Mirella Lapata,et al.  Neural Summarization by Extracting Sentences and Words , 2016, ACL.

[302]  H. Robbins Asymptotically Subminimax Solutions of Compound Statistical Decision Problems , 1985 .

[303]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[304]  Kewei Tu,et al.  Gaussian Mixture Latent Vector Grammars , 2018, ACL.

[305]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[306]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[307]  David Duvenaud,et al.  Inference Suboptimality in Variational Autoencoders , 2018, ICML.

[308]  John Hale,et al.  LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better , 2018, ACL.

[309]  Noah A. Smith,et al.  Conditional Random Field Autoencoders for Unsupervised Structured Prediction , 2014, NIPS.

[310]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[311]  Yoshimasa Tsuruoka,et al.  Learning to Parse and Translate Improves Neural Machine Translation , 2017, ACL.

[312]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.