论文信息 - Bayesian Analysis in Natural Language Processing

Bayesian Analysis in Natural Language Processing

Abstract Natural language processing (NLP) went through a profound transformation in the mid-1980s when it shifted to make heavy use of corpora and data-driven techniques to analyze language. Since then, the use of statistical techniques in NLP has evolved in several ways. One such example of evolution took place in the late 1990s or early 2000s, when full-fledged Bayesian machinery was introduced to NLP. This Bayesian approach to NLP has come to accommodate for various shortcomings in the frequentist approach and to enrich it, especially in the unsupervised setting, where statistical learning is done without target prediction examples. We cover the methods and algorithms that are needed to fluently read Bayesian learning papers in NLP and to do research in the area. These methods and algorithms are partially borrowed from both machine learning and statistics and are partially developed "in-house" in NLP. We cover inference techniques such as Markov chain Monte Carlo sampling and variational inference, Ba...

Graeme Hirst | Kevin Duh | Shay B. Cohen | Kevin Duh | G. Hirst

[1] Gholamreza Haffari,et al. Structured Prediction of Sequences and Trees Using Infinite Contexts , 2015, ECML/PKDD.

[2] Regina Barzilay,et al. Unsupervised Multilingual Learning for Morphological Segmentation , 2008, ACL.

[3] Yee Whye Teh,et al. A stochastic memoizer for sequence data , 2009, ICML '09.

[4] John Darlington,et al. A Transformation System for Developing Recursive Programs , 1977, J. ACM.

[5] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .

[6] Cosma Rohilla Shalizi,et al. Philosophy and the practice of Bayesian statistics. , 2010, The British journal of mathematical and statistical psychology.

[7] Matt Post,et al. Bayesian Learning of a Tree Substitution Grammar , 2009, ACL.

[8] S. Fienberg. Bayesian Models and Methods in Public Policy and Government Settings , 2011, 1108.2177.

[9] Stanley F. Chen,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[10] Thomas L. Griffiths,et al. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[11] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[12] Ralph Grishman,et al. A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[13] Thomas L. Griffiths,et al. Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[14] David R. Karger,et al. Content Modeling Using Latent Permutations , 2009, J. Artif. Intell. Res..

[15] Yonatan Bisk,et al. An HDP Model for Inducing Combinatory Categorial Grammars , 2013, TACL.

[16] Michael I. Jordan,et al. Variational methods for the Dirichlet process , 2004, ICML.

[17] Yee Whye Teh,et al. A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[18] Andreas Stolcke,et al. Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[19] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20] Mikio Yamamoto,et al. Dirichlet mixtures in text modeling , 2005 .

[21] R. T. Cox. Probability, frequency and reasonable expectation , 1990 .

[22] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[23] Fernando Pereira,et al. Relating Probabilistic Grammars and Automata , 1999, ACL.

[24] Dan Klein,et al. Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[25] Dan Klein,et al. Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[26] Laura Kallmeyer,et al. Data-Driven Parsing with Probabilistic Linear Context-Free Rewriting Systems , 2010, COLING.

[27] Jianfeng Gao,et al. A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers , 2008, EMNLP.

[28] Aravind K. Joshi,et al. Tree-Adjoining Grammars , 1997, Handbook of Formal Languages.

[29] Jun'ichi Tsujii,et al. Probabilistic CFG with Latent Annotations , 2005, ACL.

[30] Detlef Prescher,et al. Head-Driven PCFGs with Latent-Head Statistics , 2005, IWPT.

[31] Noah A. Smith,et al. Parsing with Soft and Hard Constraints on Dependency Length , 2005 .