A Primer on Probabilistic Inference

In this chapter, we introduce some of the tools that can be used to address these challenges. By considering how probabilistic models can be defined and used, we aim to provide some of the background relevant to the other chapters in this volume. The plan of the chapter is as follows. First, we outline the fundamentals of Bayesian inference, which is at the heart of many probabilistic models. We then discuss how to define probabilistic models that use richly structured probability distributions, introducing some of the key ideas behind graphical models, which can be used to represent the dependencies among a set of variables. Finally, we discuss two of the main algorithms that are used to evaluate the predictions of probabilistic models – the Expectation-Maximization (EM) algorithm, and Markov chain Monte Carlo (MCMC) – and some sophisticated probabilistic models that exploit these algorithms. Several books provide a more detailed discussion of these topics in the context of statistics (e.g., Berger, 1993; Bernardo & Smith, 1994; Gelman, Carlin, Stern, & Rubin, 1995), machine learning (e.g., Bishop, 2006; Duda, Hart, & Stork, 2000; Hastie, Tibshirani, & Friedman, 2001; Mackay, 2003), and artificial intelligence (e.g., Korb & Nicholson, 2003; Pearl, 1988; Russell & Norvig, 2002). Griffiths, Kemp, and Tenenbaum (in press) provide further information on some of the methods touched on in this chapter, together with examples of applications of these methods in cognitive science.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[4]  I. J. Myung,et al.  Applying Occam’s razor in modeling cognition: A Bayesian approach , 1997 .

[5]  James O. Berger,et al.  Ockham's Razor and Bayesian Analysis , 1992 .

[6]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[7]  Joshua B Tenenbaum,et al.  Theory-based causal induction. , 2009, Psychological review.

[8]  Eugene Charniak,et al.  Equations for Part-of-Speech Tagging , 1993, AAAI.

[9]  J. Tenenbaum,et al.  Theory-based Bayesian models of inductive learning and reasoning , 2006, Trends in Cognitive Sciences.

[10]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[11]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[12]  John R. Anderson,et al.  The Adaptive Character of Thought , 1990 .

[13]  A. M. Turing,et al.  Studies in the History of Probability and Statistics. XXXVII A. M. Turing's statistical work in World War II , 1979 .

[14]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[15]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[16]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[17]  Charles Kemp,et al.  Bayesian models of cognition , 2008 .

[18]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[19]  D. Vere-Jones Markov Chains , 1972, Nature.

[20]  Gerard T. Barkema,et al.  Monte Carlo Methods in Statistical Physics , 1999 .

[21]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[22]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[23]  C. Lawrence,et al.  RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble. , 2005, RNA.

[24]  Daniel Kahneman,et al.  Probabilistic reasoning , 1993 .

[25]  Mtw,et al.  Computation, causation, and discovery , 2000 .

[26]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[27]  I. J. Myung,et al.  GUEST EDITORS' INTRODUCTION: Special Issue on Model Selection , 2000 .

[28]  Thomas L. Griffiths,et al.  Discovering Latent Classes in Relational Data , 2004 .

[29]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[30]  Daniel Jurafsky,et al.  A Probabilistic Model of Lexical and Syntactic Access and Disambiguation , 1996, Cogn. Sci..

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[33]  Aarnout Brombacher,et al.  Probability... , 2009, Qual. Reliab. Eng. Int..

[34]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence , 2004, Computer science and data analysis series.

[35]  J. Tenenbaum,et al.  Structure and strength in causal induction , 2005, Cognitive Psychology.

[36]  Mark Johnson,et al.  Probability and statistics in computational linguistics, a brief review , 2004 .

[37]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[38]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[39]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[40]  P. Laplace A Philosophical Essay On Probabilities , 1902 .

[41]  Frank Keller,et al.  UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society Title Modeling Attachment Decisions with a Probabilistic Parser : The Case of Head Final Structures , 2015 .

[42]  C. Glymour The Mind's Arrows: Bayes Nets and Graphical Causal Models in Psychology , 2000 .

[43]  James Jay Horning,et al.  A study of grammatical inference , 1969 .

[44]  Ross D. Shachter Bayes-Ball: The Rational Pastime (for Determining Irrelevance and Requisite Information in Belief Networks and Influence Diagrams) , 1998, UAI.

[45]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[46]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[47]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[48]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[49]  D. Heckerman,et al.  ,81. Introduction , 2022 .

[50]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[51]  S. Sloman Causal Models: How People Think about the World and Its Alternatives , 2005 .

[52]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[53]  Refractor Vision , 2000, The Lancet.

[54]  R. T. Cox The Algebra of Probable Inference , 1962 .

[55]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[56]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[57]  Christopher D. Manning,et al.  Probabilistic models of language processing and acquisition , 2006, Trends in Cognitive Sciences.

[58]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[59]  J. Baker Trainable grammars for speech recognition , 1979 .

[60]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[61]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[62]  John E. Hershey,et al.  Computation , 1991, Digit. Signal Process..

[63]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[64]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[65]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[66]  L. Goddard Information Theory , 1962, Nature.

[67]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[68]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[69]  Nick Chater,et al.  Distributional Information: A Powerful Cue for Acquiring Syntactic Categories , 1998, Cogn. Sci..

[70]  E. Jaynes Probability theory : the logic of science , 2003 .

[71]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[72]  David G. Stork,et al.  Pattern Classification , 1973 .