论文信息 - Deep Latent Variable Models of Natural Language

Deep Latent Variable Models of Natural Language

The proposed tutorial will cover deep latent variable models both in the case where exact inference over the latent variables is tractable and when it is not. The former case includes neural extensions of unsupervised tagging and parsing models. Our discussion of the latter case, where inference cannot be performed tractably, will restrict itself to continuous latent variables. In particular, we will discuss recent developments both in neural variational inference (e.g., relating to Variational Auto-encoders) and in implicit density modeling (e.g., relating to Generative Adversarial Networks). We will highlight the challenges of applying these families of methods to NLP problems, and discuss recent successes and best practices.

[1] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[2] Glenn Carroll,et al. Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[3] Tommi S. Jaakkola,et al. Sequence to Better Sequence: Continuous Revision of Combinatorial Structures , 2017, ICML.

[4] Dan Klein,et al. A Minimal Span-Based Neural Constituency Parser , 2017, ACL.

[5] Tommi S. Jaakkola,et al. A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.

[6] Michael Collins,et al. Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[7] Tom Minka,et al. A* Sampling , 2014, NIPS.

[8] Alexander M. Rush,et al. Avoiding Latent Variable Collapse With Generative Skip Models , 2018, AISTATS.

[9] David Vázquez,et al. PixelVAE: A Latent Variable Model for Natural Images , 2016, ICLR.

[10] Yoav Goldberg,et al. Towards String-To-Tree Neural Machine Translation , 2017, ACL.

[11] Ruslan Salakhutdinov,et al. Importance Weighted Autoencoders , 2015, ICLR.

[12] Alexander M. Rush,et al. Latent Alignment and Variational Attention , 2018, NeurIPS.

[13] Roman Novak,et al. Iterative Refinement for Machine Translation , 2016, ArXiv.

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16] Eduard H. Hovy,et al. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[17] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[18] Max Welling,et al. VAE with a VampPrior , 2017, AISTATS.

[19] Percy Liang,et al. Generating Sentences by Editing Prototypes , 2017, TACL.

[20] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.

[21] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.

[22] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[23] Alexander M. Rush,et al. Image-to-Markup Generation with Coarse-to-Fine Attention , 2016, ICML.

[24] Claire Cardie,et al. SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.

[25] Yiming Yang,et al. A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text , 2019, EMNLP.

[26] Kewei Tu,et al. Structured Attentions for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27] Ahmad Emami,et al. A Neural Syntactic Language Model , 2005, Machine Learning.

[28] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.

[29] Charu C. Aggarwal,et al. A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[30] Barak A. Pearlmutter,et al. Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.

[31] Noah A. Smith,et al. Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[32] Yang Liu,et al. Learning Structured Text Representations , 2017, TACL.

[33] Eric P. Xing,et al. Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34] Ryan Cotterell,et al. Hard Non-Monotonic Attention for Character-Level Transduction , 2018, EMNLP.

[35] Armand Joulin,et al. Cooperative Learning of Disjoint Syntax and Semantics , 2019, NAACL.

[36] Alfred V. Aho,et al. Indexed Grammars—An Extension of Context-Free Grammars , 1967, SWAT.

[37] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[38] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.

[39] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[40] Samuel R. Bowman,et al. Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.

[41] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[42] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[43] Gourab Kundu,et al. On Amortizing Inference Cost for Structured Prediction , 2012, EMNLP.

[44] Ryan P. Adams,et al. Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[45] Mark Johnson,et al. Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing , 2009, NAACL.

[46] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[47] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[48] Brendan J. Frey,et al. Learning Wake-Sleep Recurrent Attention Models , 2015, NIPS.

[49] Tommi S. Jaakkola,et al. On the Partition Function and Random Maximum A-Posteriori Perturbations , 2012, ICML.

[50] Danqi Chen,et al. A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[51] Yang Liu,et al. Structured Alignment Networks for Matching Sentences , 2018, EMNLP.

[52] Jihun Choi,et al. Learning to Compose Task-Specific Tree Structures , 2017, AAAI.

[53] Francis R. Bach,et al. Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[54] Tal Linzen,et al. Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.

[55] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[56] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[57] Yoshua Bengio,et al. Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes , 2016, ArXiv.

[58] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[59] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.

[60] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[61] Baobao Chang,et al. Graph-based Dependency Parsing with Bidirectional LSTM , 2016, ACL.

[62] Graham Neubig,et al. On-the-fly Operation Batching in Dynamic Computation Graphs , 2017, NIPS.

[63] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[64] Gholamreza Haffari,et al. Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[65] Dan Klein,et al. Constituency Parsing with a Self-Attentive Encoder , 2018, ACL.

[66] Yisong Yue,et al. Iterative Amortized Inference , 2018, ICML.

[67] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[68] Peter Willett,et al. Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[69] Chris Dyer,et al. Unsupervised Word Discovery with Segmental Neural Language Models , 2018, ArXiv.

[70] Noah A. Smith,et al. Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[71] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[72] Valentin I. Spitkovsky,et al. Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction , 2013, EMNLP.

[73] Luke S. Zettlemoyer,et al. Deep Semantic Role Labeling: What Works and What’s Next , 2017, ACL.

[74] George Papandreou,et al. Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.

[75] Ole Winther,et al. Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[76] Aravind K. Joshi,et al. Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[77] Jörg Bornschein,et al. Variational Memory Addressing in Generative Models , 2017, NIPS.

[78] Vladimir Solmon,et al. The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[79] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[80] Frank Keller,et al. An Imitation Learning Approach to Unsupervised Parsing , 2019, ACL.

[81] P. Glynn. LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .

[82] Ole Winther,et al. Ladder Variational Autoencoders , 2016, NIPS.

[83] Christof Monz,et al. Ensemble Learning for Multi-Source Neural Machine Translation , 2016, COLING.

[84] Ari Rappoport,et al. Improved Fully Unsupervised Parsing with Zoomed Learning , 2010, EMNLP.

[85] Mathias Creutz,et al. Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.

[86] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[87] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[88] James Jay Horning,et al. A study of grammatical inference , 1969 .

[89] Phil Blunsom,et al. Neural Syntactic Generative Models with Exact Marginalization , 2018, NAACL.

[90] Pieter Abbeel,et al. Variational Lossy Autoencoder , 2016, ICLR.

[91] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[92] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[93] Noah A. Smith,et al. Backpropagating through Structured Argmax using a SPIGOT , 2018, ACL.

[94] Eugene Charniak,et al. Statistical language learning , 1997 .

[95] Carl Jesse Pollard,et al. Generalized phrase structure grammars, head grammars, and natural language , 1984 .

[96] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.

[97] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[98] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[99] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.

[100] Alexander M. Rush,et al. Structured Attention Networks , 2017, ICLR.

[101] Eric P. Xing,et al. Spectral Unsupervised Parsing with Additive Tree Metrics , 2014, ACL.

[102] Yonatan Bisk,et al. Inducing Grammars with and for Neural Machine Translation , 2018, NMT@ACL.

[103] Andrew Y. Ng,et al. Parsing with Compositional Vector Grammars , 2013, ACL.

[104] Bowen Zhou,et al. Pointing the Unknown Words , 2016, ACL.

[105] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[106] J. Baker. Trainable grammars for speech recognition , 1979 .

[107] Lawrence Carin,et al. Deconvolutional Latent-Variable Model for Text Sequence Matching , 2017, AAAI.

[108] Phil Blunsom,et al. Language as a Latent Variable: Discrete Generative Models for Sentence Compression , 2016, EMNLP.

[109] Joshua Goodman,et al. Parsing Inside-Out , 1998, ArXiv.

[110] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[111] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[112] David J. Weir,et al. The equivalence of four extensions of context-free grammars , 1994, Mathematical systems theory.

[113] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[114] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[115] Michael I. Jordan,et al. Factorial Hidden Markov Models , 1995, Machine Learning.

[116] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[117] Lifu Tu,et al. Learning Approximate Inference Networks for Structured Prediction , 2018, ICLR.

[118] Graham Neubig,et al. StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing , 2018, ACL.

[119] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[120] J. Zico Kolter,et al. OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[121] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[122] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[123] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[124] Phil Blunsom,et al. Collapsed Variational Bayesian Inference for PCFGs , 2013, CoNLL.

[125] Noah A. Smith,et al. Annealing Techniques For Unsupervised Statistical Language Learning , 2004, ACL.

[126] Hang Li,et al. “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[127] Samuel R. Bowman,et al. Grammar Induction with Neural Language Models: An Unusual Replication , 2018, EMNLP.

[128] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[129] Kewei Tu,et al. CRF Autoencoder for Unsupervised Dependency Parsing , 2017, EMNLP.

[130] Yoshua Bengio,et al. Z-Forcing: Training Stochastic Recurrent Networks , 2017, NIPS.

[131] Wilker Aziz,et al. A Stochastic Decoder for Neural Machine Translation , 2018, ACL.

[132] Zhe Gan,et al. VAE Learning via Stein Variational Gradient Descent , 2017, NIPS.

[133] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[134] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[135] Graham Neubig,et al. Unsupervised Learning of Syntactic Structure with Invertible Neural Projections , 2018, EMNLP.

[136] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[137] Phil Blunsom,et al. Discovering Discrete Latent Topics with Neural Variational Inference , 2017, ICML.

[138] Yuval Pinter,et al. Attention is not not Explanation , 2019, EMNLP.

[139] Zoubin Ghahramani,et al. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[140] Daniel Marcu,et al. Unsupervised Neural Hidden Markov Models , 2016, SPNLP@EMNLP.

[141] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[142] Pascal Poupart,et al. Variational Attention for Sequence-to-Sequence Models , 2017, COLING.

[143] Claire Cardie,et al. Towards Dynamic Computation Graphs via Sparse Latent Structure , 2018, EMNLP.

[144] Ruslan Salakhutdinov,et al. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[145] Noah A. Smith,et al. Is Attention Interpretable? , 2019, ACL.

[146] Kevin Gimpel,et al. A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations , 2019, NAACL.

[147] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[148] Kevin Gimpel,et al. Visually Grounded Neural Syntax Acquisition , 2019, ACL.

[149] Dan Klein,et al. Neural CRF Parsing , 2015, ACL.

[150] Hermann Ney,et al. HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[151] Juha Karhunen,et al. How to Pretrain Deep Boltzmann Machines in Two Stages , 2015 .

[152] Bernard Mérialdo,et al. Tagging English Text with a Probabilistic Model , 1994, CL.

[153] Stephen Clark,et al. Scalable Syntax-Aware Language Models Using Knowledge Distillation , 2019, ACL.

[154] Colin Raffel,et al. Learning Hard Alignments with Variational Inference , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[155] Alexander M. Rush,et al. Learning Neural Templates for Text Generation , 2018, EMNLP.

[156] Rebecca Hwa. Supervised Grammar Induction using Training Data with Limited Constituent Information , 1999, ACL.

[157] Marti A. Hearst. Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[158] Veselin Stoyanov,et al. Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[159] Wang Ling,et al. Learning to Compose Words into Sentences with Reinforcement Learning , 2016, ICLR.

[160] Rico Sennrich,et al. Proceedings of the Second Conference on Machine Translation, Volume 1: Research Papers , 2017 .

[161] Arian Maleki,et al. Benefits of over-parameterization with EM , 2018, NeurIPS.

[162] Noah A. Smith,et al. Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction , 2008, NIPS.

[163] Zhifei Li,et al. First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests , 2009, EMNLP.

[164] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[165] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[166] Fernando Pereira,et al. Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.