Bayesian Analysis in Natural Language Processing, Second Edition

Abstract Natural language processing (NLP) went through a profound transformation in the mid-1980s when it shifted to make heavy use of corpora and data-driven techniques to analyze language. Since...

[1]  Hal Daumé,et al.  Non-Parametric Bayesian Areal Linguistics , 2009, HLT-NAACL.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Cosma Rohilla Shalizi,et al.  Philosophy and the practice of Bayesian statistics. , 2010, The British journal of mathematical and statistical psychology.

[4]  Nikolaos V. Sahinidis,et al.  Derivative-free optimization: a review of algorithms and comparison of software implementations , 2013, J. Glob. Optim..

[5]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[6]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[7]  T. Griffiths,et al.  A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[8]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[9]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[10]  Gholamreza Haffari,et al.  Structured Prediction of Sequences and Trees Using Infinite Contexts , 2015, ECML/PKDD.

[11]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[12]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[13]  Alex Graves,et al.  Supervised Sequence Labelling , 2012 .

[14]  O. Cappé,et al.  On‐line expectation–maximization algorithm for latent data models , 2009 .

[15]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[16]  David J. Weir,et al.  Characterizing Structural Descriptions Produced by Various Grammatical Formalisms , 1987, ACL.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[19]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[20]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[21]  Ben O'Neill,et al.  Exchangeability, Correlation, and Bayes' Effect , 2009 .

[22]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[23]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[24]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[25]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[26]  Matt Post,et al.  Bayesian Learning of a Tree Substitution Grammar , 2009, ACL.

[27]  Noah A. Smith,et al.  Compiling Comp Ling: Weighted Dynamic Programming and the Dyna Language , 2005, HLT.

[28]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[29]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[30]  Yee Whye Teh,et al.  Beam sampling for the infinite hidden Markov model , 2008, ICML '08.

[31]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[32]  M. Steedman,et al.  Combinatory Categorial Grammar , 2011 .

[33]  Thomas Hofmann,et al.  Gaussian process classification for segmenting and annotating sequences , 2004, ICML.

[34]  Regina Barzilay,et al.  Bayesian Unsupervised Topic Segmentation , 2008, EMNLP.

[35]  Joshua Goodman,et al.  Parsing Algorithms and Metrics , 1996, ACL.

[36]  Christopher D. Manning,et al.  Hierarchical Bayesian Domain Adaptation , 2009, NAACL.

[37]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[38]  Regina Barzilay,et al.  Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: a Bayesian Non-Parametric Approach , 2009, NAACL.

[39]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[40]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[41]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[42]  Shin Ishii,et al.  On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[43]  Yonatan Bisk,et al.  An HDP Model for Inducing Combinatory Categorial Grammars , 2013, TACL.

[44]  Dan Roth,et al.  Integer linear programming inference for conditional random fields , 2005, ICML.

[45]  Noah A. Smith,et al.  Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction , 2009, NAACL.

[46]  S. Fienberg Bayesian Models and Methods in Public Policy and Government Settings , 2011, 1108.2177.

[47]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Charles Kemp,et al.  How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[49]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[50]  David R. Karger,et al.  Content Modeling Using Latent Permutations , 2009, J. Artif. Intell. Res..

[51]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[52]  Detlef Prescher,et al.  Head-Driven PCFGs with Latent-Head Statistics , 2005, IWPT.

[53]  Noah A. Smith,et al.  Parsing with Soft and Hard Constraints on Dependency Length , 2005 .

[54]  Shay B. Cohen,et al.  Online Adaptor Grammars with Hybrid Inference , 2014, Transactions of the Association for Computational Linguistics.

[55]  John DeNero,et al.  Sampling Alignment Structure under a Bayesian Translation Model , 2008, EMNLP.

[56]  Charles Kemp,et al.  Bayesian models of cognition , 2008 .

[57]  Fernando Pereira,et al.  Relating Probabilistic Grammars and Automata , 1999, ACL.

[58]  B. D. Finetti,et al.  Foresight: Its Logical Laws, Its Subjective Sources , 1992 .

[59]  Markus Dreyer,et al.  Better Informed Training of Latent Syntactic Features , 2006, EMNLP.

[60]  Dan Klein,et al.  Online EM for Unsupervised Models , 2009, NAACL.

[61]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[62]  Jason Eisner,et al.  Transformational Priors Over Grammars , 2002, EMNLP.

[63]  Chris Dyer,et al.  A Gibbs Sampler for Phrasal Synchronous Grammar Induction , 2009, ACL.

[64]  Regina Barzilay,et al.  Unsupervised Multilingual Grammar Induction , 2009, ACL.

[65]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[66]  Mark Johnson,et al.  Exploring the Role of Stress in Bayesian Word Segmentation using Adaptor Grammars , 2014, TACL.

[67]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[68]  James Henderson,et al.  Inducing History Representations for Broad Coverage Statistical Parsing , 2003, NAACL.

[69]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[70]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[71]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[72]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[73]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[74]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[75]  Shankar Kumar,et al.  Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[76]  Shankar Kumar,et al.  Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2008, EMNLP.

[77]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[78]  Thomas L. Griffiths,et al.  Probabilistic Topic Models , 2007 .

[79]  J. Tenenbaum,et al.  A tutorial introduction to Bayesian models of cognitive development , 2011, Cognition.

[80]  J. Tenenbaum,et al.  Probabilistic models of cognition: exploring representations and inductive biases , 2010, Trends in Cognitive Sciences.

[81]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[82]  Regina Barzilay,et al.  Unsupervised Multilingual Learning for POS Tagging , 2008, EMNLP.

[83]  Matt Post,et al.  Bayesian Tree Substitution Grammars as a Usage-based Approach , 2013, Language and speech.

[84]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[85]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[86]  Mikio Yamamoto,et al.  Dirichlet mixtures in text modeling , 2005 .

[87]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[88]  Laura Kallmeyer,et al.  Data-Driven Parsing with Probabilistic Linear Context-Free Rewriting Systems , 2010, COLING.

[89]  Jeffrey L. Elman,et al.  Distributed Representations, Simple Recurrent Networks, and Grammatical Structure , 1991, Mach. Learn..

[90]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[91]  John Darlington,et al.  A Transformation System for Developing Recursive Programs , 1977, J. ACM.

[92]  Catherine L. Harris,et al.  Connectionism and Cognitive Linguistics , 1990 .

[93]  Yee Whye Teh,et al.  A stochastic memoizer for sequence data , 2009, ICML '09.

[94]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[95]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[96]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[97]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[98]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[99]  Jianfeng Gao,et al.  A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers , 2008, EMNLP.

[100]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[101]  Aravind K. Joshi,et al.  Tree-Adjoining Grammars , 1997, Handbook of Formal Languages.

[102]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[103]  Mark Johnson,et al.  Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars , 2009, NAACL.

[104]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[105]  Stanley F. Chen,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[106]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[107]  Mikel L. Forcada,et al.  Asynchronous translations with recurrent neural nets , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[108]  Graeme Hirst,et al.  Bayesian Analysis in Natural Language Processing , 2016, Computational Linguistics.

[109]  Michael I. Jordan,et al.  Variational methods for the Dirichlet process , 2004, ICML.