论文信息 - Bayesian Analysis in Natural Language Processing, Second Edition

Bayesian Analysis in Natural Language Processing, Second Edition

Abstract Natural language processing (NLP) went through a profound transformation in the mid-1980s when it shifted to make heavy use of corpora and data-driven techniques to analyze language. Since...

Shay B. Cohen

[1] Hal Daumé,et al. Non-Parametric Bayesian Areal Linguistics , 2009, HLT-NAACL.

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Cosma Rohilla Shalizi,et al. Philosophy and the practice of Bayesian statistics. , 2010, The British journal of mathematical and statistical psychology.

[4] Nikolaos V. Sahinidis,et al. Derivative-free optimization: a review of algorithms and comparison of software implementations , 2013, J. Glob. Optim..

[5] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[6] Samy Bengio,et al. Torch: a modular machine learning software library , 2002 .

[7] T. Griffiths,et al. A Bayesian framework for word segmentation: Exploring the effects of context , 2009, Cognition.

[8] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[9] Mirella Lapata,et al. Vector-based Models of Semantic Composition , 2008, ACL.

[10] Gholamreza Haffari,et al. Structured Prediction of Sequences and Trees Using Infinite Contexts , 2015, ECML/PKDD.

[11] Tadao Kasami,et al. An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[12] Patrick Pantel,et al. From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[13] Alex Graves,et al. Supervised Sequence Labelling , 2012 .

[14] O. Cappé,et al. On‐line expectation–maximization algorithm for latent data models , 2009 .

[15] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[16] David J. Weir,et al. Characterizing Structural Descriptions Produced by Various Grammatical Formalisms , 1987, ACL.

[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18] Ken-ichi Funahashi,et al. On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[19] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[20] Christoph Goller,et al. Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[21] Ben O'Neill,et al. Exchangeability, Correlation, and Bayes' Effect , 2009 .

[22] Thomas L. Griffiths,et al. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[23] Thomas L. Griffiths,et al. Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[24] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .

[25] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[26] Matt Post,et al. Bayesian Learning of a Tree Substitution Grammar , 2009, ACL.

[27] Noah A. Smith,et al. Compiling Comp Ling: Weighted Dynamic Programming and the Dyna Language , 2005, HLT.

[28] M. Escobar,et al. Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[29] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[30] Yee Whye Teh,et al. Beam sampling for the infinite hidden Markov model , 2008, ICML '08.

[31] Jay Earley,et al. An efficient context-free parsing algorithm , 1970, Commun. ACM.

[32] M. Steedman,et al. Combinatory Categorial Grammar , 2011 .

[33] Thomas Hofmann,et al. Gaussian process classification for segmenting and annotating sequences , 2004, ICML.

[34] Regina Barzilay,et al. Bayesian Unsupervised Topic Segmentation , 2008, EMNLP.

[35] Joshua Goodman,et al. Parsing Algorithms and Metrics , 1996, ACL.

[36] Christopher D. Manning,et al. Hierarchical Bayesian Domain Adaptation , 2009, NAACL.

[37] R. Rosenfeld,et al. Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[38] Regina Barzilay,et al. Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: a Bayesian Non-Parametric Approach , 2009, NAACL.

[39] Jun'ichi Tsujii,et al. Probabilistic CFG with Latent Annotations , 2005, ACL.

[40] Dan Klein,et al. Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[41] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[42] Shin Ishii,et al. On-line EM Algorithm for the Normalized Gaussian Network , 2000, Neural Computation.

[43] Yonatan Bisk,et al. An HDP Model for Inducing Combinatory Categorial Grammars , 2013, TACL.

[44] Dan Roth,et al. Integer linear programming inference for conditional random fields , 2005, ICML.

[45] Noah A. Smith,et al. Shared Logistic Normal Distributions for Soft Parameter Tying in Unsupervised Grammar Induction , 2009, NAACL.

[46] S. Fienberg. Bayesian Models and Methods in Public Policy and Government Settings , 2011, 1108.2177.

[47] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48] Charles Kemp,et al. How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[49] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[50] David R. Karger,et al. Content Modeling Using Latent Permutations , 2009, J. Artif. Intell. Res..

[51] Geoffrey Zweig,et al. Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[52] Detlef Prescher,et al. Head-Driven PCFGs with Latent-Head Statistics , 2005, IWPT.

[53] Noah A. Smith,et al. Parsing with Soft and Hard Constraints on Dependency Length , 2005 .

[54] Shay B. Cohen,et al. Online Adaptor Grammars with Hybrid Inference , 2014, Transactions of the Association for Computational Linguistics.

[55] John DeNero,et al. Sampling Alignment Structure under a Bayesian Translation Model , 2008, EMNLP.

[56] Charles Kemp,et al. Bayesian models of cognition , 2008 .

[57] Fernando Pereira,et al. Relating Probabilistic Grammars and Automata , 1999, ACL.

[58] B. D. Finetti,et al. Foresight: Its Logical Laws, Its Subjective Sources , 1992 .

[59] Markus Dreyer,et al. Better Informed Training of Latent Syntactic Features , 2006, EMNLP.

[60] Dan Klein,et al. Online EM for Unsupervised Models , 2009, NAACL.

[61] Hanna M. Wallach,et al. Topic modeling: beyond bag-of-words , 2006, ICML.

[62] Jason Eisner,et al. Transformational Priors Over Grammars , 2002, EMNLP.

[63] Chris Dyer,et al. A Gibbs Sampler for Phrasal Synchronous Grammar Induction , 2009, ACL.

[64] Regina Barzilay,et al. Unsupervised Multilingual Grammar Induction , 2009, ACL.

[65] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[66] Mark Johnson,et al. Exploring the Role of Stress in Bayesian Word Segmentation using Adaptor Grammars , 2014, TACL.

[67] Ralph Grishman,et al. A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[68] James Henderson,et al. Inducing History Representations for Broad Coverage Statistical Parsing , 2003, NAACL.

[69] Andreas Stolcke,et al. Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[70] Daniel H. Younger,et al. Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[71] Yee Whye Teh,et al. A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[72] Yoshua Bengio,et al. Neural Probabilistic Language Models , 2006 .

[73] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[74] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[75] Shankar Kumar,et al. Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2004, NAACL.

[76] Shankar Kumar,et al. Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation , 2008, EMNLP.

[77] Thomas L. Griffiths,et al. Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[78] Thomas L. Griffiths,et al. Probabilistic Topic Models , 2007 .

[79] J. Tenenbaum,et al. A tutorial introduction to Bayesian models of cognitive development , 2011, Cognition.

[80] J. Tenenbaum,et al. Probabilistic models of cognition: exploring representations and inductive biases , 2010, Trends in Cognitive Sciences.

[81] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[82] Regina Barzilay,et al. Unsupervised Multilingual Learning for POS Tagging , 2008, EMNLP.

[83] Matt Post,et al. Bayesian Tree Substitution Grammars as a Usage-based Approach , 2013, Language and speech.

[84] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[85] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[86] Mikio Yamamoto,et al. Dirichlet mixtures in text modeling , 2005 .

[87] Dan Klein,et al. Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[88] Laura Kallmeyer,et al. Data-Driven Parsing with Probabilistic Linear Context-Free Rewriting Systems , 2010, COLING.

[89] Jeffrey L. Elman,et al. Distributed Representations, Simple Recurrent Networks, and Grammatical Structure , 1991, Mach. Learn..

[90] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[91] John Darlington,et al. A Transformation System for Developing Recursive Programs , 1977, J. ACM.

[92] Catherine L. Harris,et al. Connectionism and Cognitive Linguistics , 1990 .

[93] Yee Whye Teh,et al. A stochastic memoizer for sequence data , 2009, ICML '09.

[94] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[95] Zellig S. Harris,et al. Distributional Structure , 1954 .

[96] R. T. Cox. Probability, frequency and reasonable expectation , 1990 .

[97] Jordan B. Pollack,et al. Recursive Distributed Representations , 1990, Artif. Intell..

[98] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[99] Jianfeng Gao,et al. A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers , 2008, EMNLP.

[100] Yee Whye Teh,et al. Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[101] Aravind K. Joshi,et al. Tree-Adjoining Grammars , 1997, Handbook of Formal Languages.

[102] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[103] Mark Johnson,et al. Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars , 2009, NAACL.

[104] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[105] Stanley F. Chen,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[106] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[107] Mikel L. Forcada,et al. Asynchronous translations with recurrent neural nets , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[108] Graeme Hirst,et al. Bayesian Analysis in Natural Language Processing , 2016, Computational Linguistics.

[109] Michael I. Jordan,et al. Variational methods for the Dirichlet process , 2004, ICML.