Deep Latent Variable Models of Natural Language
暂无分享,去创建一个
[1] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[2] Glenn Carroll,et al. Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .
[3] Tommi S. Jaakkola,et al. Sequence to Better Sequence: Continuous Revision of Combinatorial Structures , 2017, ICML.
[4] Dan Klein,et al. A Minimal Span-Based Neural Constituency Parser , 2017, ACL.
[5] Tommi S. Jaakkola,et al. A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.
[6] Michael Collins,et al. Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.
[7] Tom Minka,et al. A* Sampling , 2014, NIPS.
[8] Alexander M. Rush,et al. Avoiding Latent Variable Collapse With Generative Skip Models , 2018, AISTATS.
[9] David Vázquez,et al. PixelVAE: A Latent Variable Model for Natural Images , 2016, ICLR.
[10] Yoav Goldberg,et al. Towards String-To-Tree Neural Machine Translation , 2017, ACL.
[11] Ruslan Salakhutdinov,et al. Importance Weighted Autoencoders , 2015, ICLR.
[12] Alexander M. Rush,et al. Latent Alignment and Variational Attention , 2018, NeurIPS.
[13] Roman Novak,et al. Iterative Refinement for Machine Translation , 2016, ArXiv.
[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[16] Eduard H. Hovy,et al. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.
[17] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[18] Max Welling,et al. VAE with a VampPrior , 2017, AISTATS.
[19] Percy Liang,et al. Generating Sentences by Editing Prototypes , 2017, TACL.
[20] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.
[21] David M. Blei,et al. Variational Inference: A Review for Statisticians , 2016, ArXiv.
[22] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[23] Alexander M. Rush,et al. Image-to-Markup Generation with Coarse-to-Fine Attention , 2016, ICML.
[24] Claire Cardie,et al. SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.
[25] Yiming Yang,et al. A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text , 2019, EMNLP.
[26] Kewei Tu,et al. Structured Attentions for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[27] Ahmad Emami,et al. A Neural Syntactic Language Model , 2005, Machine Learning.
[28] Shakir Mohamed,et al. Variational Inference with Normalizing Flows , 2015, ICML.
[29] Charu C. Aggarwal,et al. A Survey of Text Clustering Algorithms , 2012, Mining Text Data.
[30] Barak A. Pearlmutter,et al. Automatic Learning Rate Maximization by On-Line Estimation of the Hessian's Eigenvectors , 1992, NIPS 1992.
[31] Noah A. Smith,et al. Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.
[32] Yang Liu,et al. Learning Structured Text Representations , 2017, TACL.
[33] Eric P. Xing,et al. Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[34] Ryan Cotterell,et al. Hard Non-Monotonic Attention for Character-Level Transduction , 2018, EMNLP.
[35] Armand Joulin,et al. Cooperative Learning of Disjoint Syntax and Semantics , 2019, NAACL.
[36] Alfred V. Aho,et al. Indexed Grammars—An Extension of Context-Free Grammars , 1967, SWAT.
[37] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.
[38] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.
[39] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.
[40] Samuel R. Bowman,et al. Do latent tree learning models identify meaningful structure in sentences? , 2017, TACL.
[41] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[42] Alex Graves,et al. Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.
[43] Gourab Kundu,et al. On Amortizing Inference Cost for Structured Prediction , 2012, EMNLP.
[44] Ryan P. Adams,et al. Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.
[45] Mark Johnson,et al. Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing , 2009, NAACL.
[46] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[47] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[48] Brendan J. Frey,et al. Learning Wake-Sleep Recurrent Attention Models , 2015, NIPS.
[49] Tommi S. Jaakkola,et al. On the Partition Function and Random Maximum A-Posteriori Perturbations , 2012, ICML.
[50] Danqi Chen,et al. A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.
[51] Yang Liu,et al. Structured Alignment Networks for Matching Sentences , 2018, EMNLP.
[52] Jihun Choi,et al. Learning to Compose Task-Specific Tree Structures , 2017, AAAI.
[53] Francis R. Bach,et al. Online Learning for Latent Dirichlet Allocation , 2010, NIPS.
[54] Tal Linzen,et al. Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.
[55] Karol Gregor,et al. Neural Variational Inference and Learning in Belief Networks , 2014, ICML.
[56] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[57] Yoshua Bengio,et al. Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes , 2016, ArXiv.
[58] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[59] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.
[60] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[61] Baobao Chang,et al. Graph-based Dependency Parsing with Bidirectional LSTM , 2016, ACL.
[62] Graham Neubig,et al. On-the-fly Operation Batching in Dynamic Computation Graphs , 2017, NIPS.
[63] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[64] Gholamreza Haffari,et al. Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.
[65] Dan Klein,et al. Constituency Parsing with a Self-Attentive Encoder , 2018, ACL.
[66] Yisong Yue,et al. Iterative Amortized Inference , 2018, ICML.
[67] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[68] Peter Willett,et al. Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..
[69] Chris Dyer,et al. Unsupervised Word Discovery with Segmental Neural Language Models , 2018, ArXiv.
[70] Noah A. Smith,et al. Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.
[71] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[72] Valentin I. Spitkovsky,et al. Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction , 2013, EMNLP.
[73] Luke S. Zettlemoyer,et al. Deep Semantic Role Labeling: What Works and What’s Next , 2017, ACL.
[74] George Papandreou,et al. Perturb-and-MAP random fields: Using discrete optimization to learn and sample from energy models , 2011, 2011 International Conference on Computer Vision.
[75] Ole Winther,et al. Sequential Neural Models with Stochastic Layers , 2016, NIPS.
[76] Aravind K. Joshi,et al. Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..
[77] Jörg Bornschein,et al. Variational Memory Addressing in Generative Models , 2017, NIPS.
[78] Vladimir Solmon,et al. The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .
[79] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[80] Frank Keller,et al. An Imitation Learning Approach to Unsupervised Parsing , 2019, ACL.
[81] P. Glynn. LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .
[82] Ole Winther,et al. Ladder Variational Autoencoders , 2016, NIPS.
[83] Christof Monz,et al. Ensemble Learning for Multi-Source Neural Machine Translation , 2016, COLING.
[84] Ari Rappoport,et al. Improved Fully Unsupervised Parsing with Zoomed Learning , 2010, EMNLP.
[85] Mathias Creutz,et al. Unsupervised Discovery of Morphemes , 2002, SIGMORPHON.
[86] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.
[87] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[88] James Jay Horning,et al. A study of grammatical inference , 1969 .
[89] Phil Blunsom,et al. Neural Syntactic Generative Models with Exact Marginalization , 2018, NAACL.
[90] Pieter Abbeel,et al. Variational Lossy Autoencoder , 2016, ICLR.
[91] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.
[92] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.
[93] Noah A. Smith,et al. Backpropagating through Structured Argmax using a SPIGOT , 2018, ACL.
[94] Eugene Charniak,et al. Statistical language learning , 1997 .
[95] Carl Jesse Pollard,et al. Generalized phrase structure grammars, head grammars, and natural language , 1984 .
[96] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[97] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.
[98] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.
[99] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[100] Alexander M. Rush,et al. Structured Attention Networks , 2017, ICLR.
[101] Eric P. Xing,et al. Spectral Unsupervised Parsing with Additive Tree Metrics , 2014, ACL.
[102] Yonatan Bisk,et al. Inducing Grammars with and for Neural Machine Translation , 2018, NMT@ACL.
[103] Andrew Y. Ng,et al. Parsing with Compositional Vector Grammars , 2013, ACL.
[104] Bowen Zhou,et al. Pointing the Unknown Words , 2016, ACL.
[105] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[106] J. Baker. Trainable grammars for speech recognition , 1979 .
[107] Lawrence Carin,et al. Deconvolutional Latent-Variable Model for Text Sequence Matching , 2017, AAAI.
[108] Phil Blunsom,et al. Language as a Latent Variable: Discrete Generative Models for Sentence Compression , 2016, EMNLP.
[109] Joshua Goodman,et al. Parsing Inside-Out , 1998, ArXiv.
[110] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[111] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[112] David J. Weir,et al. The equivalence of four extensions of context-free grammars , 1994, Mathematical systems theory.
[113] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[114] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[115] Michael I. Jordan,et al. Factorial Hidden Markov Models , 1995, Machine Learning.
[116] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[117] Lifu Tu,et al. Learning Approximate Inference Networks for Structured Prediction , 2018, ICLR.
[118] Graham Neubig,et al. StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing , 2018, ACL.
[119] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[120] J. Zico Kolter,et al. OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.
[121] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[122] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.
[123] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[124] Phil Blunsom,et al. Collapsed Variational Bayesian Inference for PCFGs , 2013, CoNLL.
[125] Noah A. Smith,et al. Annealing Techniques For Unsupervised Statistical Language Learning , 2004, ACL.
[126] Hang Li,et al. “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .
[127] Samuel R. Bowman,et al. Grammar Induction with Neural Language Models: An Unusual Replication , 2018, EMNLP.
[128] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[129] Kewei Tu,et al. CRF Autoencoder for Unsupervised Dependency Parsing , 2017, EMNLP.
[130] Yoshua Bengio,et al. Z-Forcing: Training Stochastic Recurrent Networks , 2017, NIPS.
[131] Wilker Aziz,et al. A Stochastic Decoder for Neural Machine Translation , 2018, ACL.
[132] Zhe Gan,et al. VAE Learning via Stein Variational Gradient Descent , 2017, NIPS.
[133] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.
[134] Mark W. Schmidt,et al. Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.
[135] Graham Neubig,et al. Unsupervised Learning of Syntactic Structure with Invertible Neural Projections , 2018, EMNLP.
[136] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.
[137] Phil Blunsom,et al. Discovering Discrete Latent Topics with Neural Variational Inference , 2017, ICML.
[138] Yuval Pinter,et al. Attention is not not Explanation , 2019, EMNLP.
[139] Zoubin Ghahramani,et al. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.
[140] Daniel Marcu,et al. Unsupervised Neural Hidden Markov Models , 2016, SPNLP@EMNLP.
[141] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.
[142] Pascal Poupart,et al. Variational Attention for Sequence-to-Sequence Models , 2017, COLING.
[143] Claire Cardie,et al. Towards Dynamic Computation Graphs via Sparse Latent Structure , 2018, EMNLP.
[144] Ruslan Salakhutdinov,et al. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.
[145] Noah A. Smith,et al. Is Attention Interpretable? , 2019, ACL.
[146] Kevin Gimpel,et al. A Multi-Task Approach for Disentangling Syntax and Semantics in Sentence Representations , 2019, NAACL.
[147] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[148] Kevin Gimpel,et al. Visually Grounded Neural Syntax Acquisition , 2019, ACL.
[149] Dan Klein,et al. Neural CRF Parsing , 2015, ACL.
[150] Hermann Ney,et al. HMM-Based Word Alignment in Statistical Translation , 1996, COLING.
[151] Juha Karhunen,et al. How to Pretrain Deep Boltzmann Machines in Two Stages , 2015 .
[152] Bernard Mérialdo,et al. Tagging English Text with a Probabilistic Model , 1994, CL.
[153] Stephen Clark,et al. Scalable Syntax-Aware Language Models Using Knowledge Distillation , 2019, ACL.
[154] Colin Raffel,et al. Learning Hard Alignments with Variational Inference , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[155] Alexander M. Rush,et al. Learning Neural Templates for Text Generation , 2018, EMNLP.
[156] Rebecca Hwa. Supervised Grammar Induction using Training Data with Limited Constituent Information , 1999, ACL.
[157] Marti A. Hearst. Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.
[158] Veselin Stoyanov,et al. Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.
[159] Wang Ling,et al. Learning to Compose Words into Sentences with Reinforcement Learning , 2016, ICLR.
[160] Rico Sennrich,et al. Proceedings of the Second Conference on Machine Translation, Volume 1: Research Papers , 2017 .
[161] Arian Maleki,et al. Benefits of over-parameterization with EM , 2018, NeurIPS.
[162] Noah A. Smith,et al. Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction , 2008, NIPS.
[163] Zhifei Li,et al. First- and Second-Order Expectation Semirings with Applications to Minimum-Risk Training on Translation Forests , 2009, EMNLP.
[164] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.
[165] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[166] Fernando Pereira,et al. Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.
[167] Andrew Y. Ng,et al. Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines , 2006, EMNLP.
[168] Yaoliang Yu,et al. Dropout with Expectation-linear Regularization , 2016, ICLR.
[169] Hongyu Guo,et al. Long Short-Term Memory Over Tree Structures , 2015, ArXiv.
[170] Andrew McCallum,et al. End-to-End Learning for Structured Prediction Energy Networks , 2017, ICML.
[171] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[172] Phil Blunsom,et al. Neural Variational Inference for Text Processing , 2015, ICML.
[173] Eric P. Xing,et al. Toward Controlled Generation of Text , 2017, ICML.
[174] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.
[175] Lei Li,et al. On Tree-Based Neural Sentence Modeling , 2018, EMNLP.
[176] Rens Bod,et al. An All-Subtrees Approach to Unsupervised Parsing , 2006, ACL.
[177] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[178] Jason Eisner,et al. Inside-Outside and Forward-Backward Algorithms Are Just Backprop (tutorial paper) , 2016, SPNLP@EMNLP.
[179] Andriy Mnih,et al. Variational Inference for Monte Carlo Objectives , 2016, ICML.
[180] Alexander Yates,et al. Factorial Hidden Markov Models for Learning Representations of Natural Language , 2013, ICLR.
[181] Noah A. Smith,et al. Recurrent Neural Network Grammars , 2016, NAACL.
[182] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.
[183] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[184] Kevin Gimpel,et al. Controllable Paraphrase Generation with a Syntactic Exemplar , 2019, ACL.
[185] Yang Liu,et al. Dependency Grammar Induction with a Neural Variational Transition-based Parser , 2018, AAAI.
[186] Jiacheng Xu,et al. Spherical Latent Spaces for Stable Variational Autoencoders , 2018, EMNLP.
[187] Anoop Sarkar,et al. Top-down Tree Structured Decoding with Syntactic Connections for Neural Machine Translation and Parsing , 2018, EMNLP.
[188] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[189] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[190] Dan Klein,et al. Abstract Syntax Networks for Code Generation and Semantic Parsing , 2017, ACL.
[191] Mark Johnson,et al. Using Left-corner Parsing to Encode Universal Structural Constraints in Grammar Induction , 2016, EMNLP.
[192] Rebecca Hwa,et al. Sample Selection for Statistical Grammar Induction , 2000, EMNLP.
[193] Noah A. Smith,et al. What Do Recurrent Neural Network Grammars Learn About Syntax? , 2016, EACL.
[194] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.
[195] Graham Neubig,et al. A Tree-based Decoder for Neural Machine Translation , 2018, EMNLP.
[196] Stephen Clark,et al. Latent Tree Learning with Differentiable Parsers: Shift-Reduce Parsing and Chart Parsing , 2018, ArXiv.
[197] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[198] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[199] Hugo Larochelle,et al. Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.
[200] Michael Cogswell,et al. Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles , 2016, NIPS.
[201] Alexander M. Rush,et al. Unsupervised Recurrent Neural Network Grammars , 2019, NAACL.
[202] Chong Wang,et al. TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency , 2016, ICLR.
[203] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.
[204] Marc'Aurelio Ranzato,et al. Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.
[205] Dan Klein,et al. A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.
[206] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[207] John Hale,et al. Finding syntax in human encephalography with beam search , 2018, ACL.
[208] Yoav Seginer,et al. Fast Unsupervised Incremental Parsing , 2007, ACL.
[209] Noah A. Smith,et al. Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction , 2009, NAACL.
[210] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[211] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..
[212] Sunita Sarawagi,et al. Surprisingly Easy Hard-Attention for Sequence to Sequence Learning , 2018, EMNLP.
[213] Stuart M. Shieber,et al. Evidence against the context-freeness of natural language , 1985 .
[214] Chris Dyer,et al. Pushing the bounds of dropout , 2018, ArXiv.
[215] Marcello Federico,et al. Report on the 10th IWSLT evaluation campaign , 2013, IWSLT.
[216] Juha Karhunen,et al. A Two-Stage Pretraining Algorithm for Deep Boltzmann Machines , 2013, ICANN.
[217] Frank D. Wood,et al. Inference Networks for Sequential Monte Carlo in Graphical Models , 2016, ICML.
[218] Mark Johnson,et al. PCFG Models of Linguistic Tree Representations , 1998, CL.
[219] Richard Socher,et al. Towards Neural Machine Translation with Latent Tree Attention , 2017, SPNLP@EMNLP.
[220] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[221] Dan Klein,et al. Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.
[222] Phil Blunsom,et al. Generative Incremental Dependency Parsing with Neural Networks , 2015, ACL.
[223] Kenichi Kurihara,et al. Variational Bayesian Grammar Induction for Natural Language , 2006, ICGI.
[224] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.
[225] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[226] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[227] Aaron C. Courville,et al. Neural Language Modeling by Jointly Learning Syntax and Lexicon , 2017, ICLR.
[228] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[229] Max Welling,et al. Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.
[230] Mark Steedman,et al. Combinatory grammars and parasitic gaps , 1987 .
[231] Benjamin Schrauwen,et al. Training energy-based models for time-series imputation , 2013, J. Mach. Learn. Res..
[232] Tommi S. Jaakkola,et al. Tree-structured decoding with doubly-recurrent neural networks , 2016, ICLR.
[233] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.
[234] Ramón Fernández Astudillo,et al. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.
[235] Lane Schwartz,et al. Unsupervised Grammar Induction with Depth-bounded PCFG , 2018, TACL.
[236] Xiaodong Liu,et al. Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.
[237] Justin Domke,et al. Generic Methods for Optimization-Based Modeling , 2012, AISTATS.
[238] Karl Stratos,et al. Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction , 2018, NAACL.
[239] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[240] Karl Stratos,et al. Spectral Learning of Latent-Variable PCFGs , 2012, ACL.
[241] Dan Klein,et al. The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.
[242] Thomas L. Griffiths,et al. Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.
[243] Christopher D. Manning,et al. Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.
[244] R. Lee Humphreys,et al. The linguistics of punctuation , 2004, Machine Translation.
[245] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[246] Lane Schwartz,et al. Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction , 2018, EMNLP.
[247] Mark Johnson,et al. Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.
[248] Alexander Clark. Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.
[249] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[250] Graham Neubig,et al. Lagging Inference Networks and Posterior Collapse in Variational Autoencoders , 2019, ICLR.
[251] Dan Klein,et al. Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.
[252] Arto Klami,et al. Importance Sampled Stochastic Optimization for Variational Inference , 2017, UAI.
[253] M. A. R T A P A L,et al. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.
[254] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.
[255] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[256] Matthew D. Hoffman,et al. On the challenges of learning with inference networks on sparse, high-dimensional data , 2017, AISTATS.
[257] Zhiting Hu,et al. Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.
[258] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[259] Ivan Titov,et al. Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder , 2018, ICLR.
[260] Shan Wu,et al. Variational Recurrent Neural Machine Translation , 2018, AAAI.
[261] Shuohang Wang,et al. Machine Comprehension Using Match-LSTM and Answer Pointer , 2016, ICLR.
[262] H. Robbins. An Empirical Bayes Approach to Statistics , 1956 .
[263] Zhe Gan,et al. Topic Compositional Neural Language Model , 2017, AISTATS.
[264] Erhardt Barth,et al. A Hybrid Convolutional Variational Autoencoder for Text Generation , 2017, EMNLP.
[265] Manaal Faruqui,et al. Text Generation with Exemplar-based Adaptive Decoding , 2019, NAACL.
[266] Jun'ichi Tsujii,et al. Probabilistic CFG with Latent Annotations , 2005, ACL.
[267] Min Zhang,et al. Improved Constituent Context Model with Features , 2012, PACLIC.
[268] Koray Kavukcuoglu,et al. Multiple Object Recognition with Visual Attention , 2014, ICLR.
[269] Aaron C. Courville,et al. Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.
[270] John DeNero,et al. A Feature-Rich Constituent Context Model for Grammar Induction , 2012, ACL.
[271] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.
[272] Vlad Niculae,et al. A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.
[273] Cícero Nogueira dos Santos,et al. Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.
[274] David Pfau,et al. Unrolled Generative Adversarial Networks , 2016, ICLR.
[275] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[276] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.
[277] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[278] E. Mark Gold,et al. Language Identification in the Limit , 1967, Inf. Control..
[279] Regina Barzilay,et al. Unsupervised Multilingual Grammar Induction , 2009, ACL.
[280] Mohit Yadav,et al. Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Auto-Encoders , 2019, NAACL.
[281] Alexander M. Rush,et al. Semi-Amortized Variational Autoencoders , 2018, ICML.
[282] Nebojsa Jojic,et al. Iterative Refinement of the Approximate Posterior for Directed Belief Networks , 2015, NIPS.
[283] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[284] Bonggun Shin,et al. Classification of radiology reports using neural attention models , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).
[285] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.
[286] Yang Liu,et al. Modeling Coverage for Neural Machine Translation , 2016, ACL.
[287] Min Zhang,et al. Variational Neural Machine Translation , 2016, EMNLP.
[288] Alexander M. Rush,et al. Coarse-to-Fine Attention Models for Document Summarization , 2017, NFiS@EMNLP.
[289] Chris Dyer,et al. Unsupervised POS Induction with Word Embeddings , 2015, NAACL.
[290] Jason Lee,et al. Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement , 2018, EMNLP.
[291] Michael Figurnov,et al. Monte Carlo Gradient Estimation in Machine Learning , 2019, J. Mach. Learn. Res..
[292] Sam Wiseman,et al. Amortized Bethe Free Energy Minimization for Learning MRFs , 2019, NeurIPS.
[293] Valentin I. Spitkovsky,et al. Three Dependency-and-Boundary Models for Grammar Induction , 2012, EMNLP.
[294] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[295] Richard Socher,et al. Dynamic Coattention Networks For Question Answering , 2016, ICLR.
[296] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[297] Ben Taskar,et al. Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..
[298] Alexander M. Rush,et al. Compound Probabilistic Context-Free Grammars for Grammar Induction , 2019, ACL.
[299] Arthur Mensch,et al. Differentiable Dynamic Programming for Structured Prediction and Attention , 2018, ICML.
[300] Uri Shalit,et al. Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.
[301] Mirella Lapata,et al. Neural Summarization by Extracting Sentences and Words , 2016, ACL.
[302] H. Robbins. Asymptotically Subminimax Solutions of Compound Statistical Decision Problems , 1985 .
[303] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[304] Kewei Tu,et al. Gaussian Mixture Latent Vector Grammars , 2018, ACL.
[305] Joelle Pineau,et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.
[306] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.
[307] David Duvenaud,et al. Inference Suboptimality in Variational Autoencoders , 2018, ICML.
[308] John Hale,et al. LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better , 2018, ACL.
[309] Noah A. Smith,et al. Conditional Random Field Autoencoders for Unsupervised Structured Prediction , 2014, NIPS.
[310] Phil Blunsom,et al. A Convolutional Neural Network for Modelling Sentences , 2014, ACL.
[311] Yoshimasa Tsuruoka,et al. Learning to Parse and Translate Improves Neural Machine Translation , 2017, ACL.
[312] Yoshua Bengio,et al. Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.