Bayesian non-parametrics and the probabilistic approach to modelling

Modelling is fundamental to many fields of science and engineering. A model can be thought of as a representation of possible data one could predict from a system. The probabilistic approach to modelling uses probability theory to express all aspects of uncertainty in the model. The probabilistic approach is synonymous with Bayesian modelling, which simply uses the rules of probability theory in order to make predictions, compare alternative models, and learn model parameters and structure from data. This simple and elegant framework is most powerful when coupled with flexible probabilistic models. Flexibility is achieved through the use of Bayesian non-parametrics. This article provides an overview of probabilistic modelling and an accessible survey of some of the main tools in Bayesian non-parametrics. The survey covers the use of Bayesian non-parametrics for modelling unknown functions, density estimation, clustering, time-series modelling, and representing sparsity, hierarchies, and covariance structure. More specifically, it gives brief non-technical overviews of Gaussian processes, Dirichlet processes, infinite hidden Markov models, Indian buffet processes, Kingman’s coalescent, Dirichlet diffusion trees and Wishart processes.

[1]  Sonia Petrone,et al.  Hierarchical reinforced urn processes , 2012 .

[2]  Radford M. Neal,et al.  Density Modeling and Clustering Using Dirichlet Diffusion Trees , 2003 .

[3]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[4]  Phil Blunsom,et al.  A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction , 2011, ACL.

[5]  École d'été de probabilités de Saint-Flour,et al.  École d'été de probabilités de Saint-Flour XIII - 1983 , 1985 .

[6]  T. Ferguson BAYESIAN DENSITY ESTIMATION BY MIXTURES OF NORMAL DISTRIBUTIONS , 1983 .

[7]  Radford M. Neal Assessing Relevance determination methods using DELVE , 1998 .

[8]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[9]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[10]  Tai Sing Lee,et al.  The Block Diagonal Infinite Hidden Markov Model , 2009, AISTATS.

[11]  David Bruce Wilson,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996, Random Struct. Algorithms.

[12]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[13]  A. O'Hagan,et al.  Bayes–Hermite quadrature , 1991 .

[14]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[15]  Eric Saund,et al.  Unsupervised Learning of Mixtures of Multiple Causes in Binary Data , 1993, NIPS.

[16]  O. Kallenberg Probabilistic Symmetries and Invariance Principles , 2005 .

[17]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[18]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[19]  Thomas L. Griffiths,et al.  Nonparametric Latent Feature Models for Link Prediction , 2009, NIPS.

[20]  Adam Binch,et al.  Perception as Bayesian Inference , 2014 .

[21]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[22]  M. Glickman,et al.  Multivariate Stochastic Volatility via Wishart Processes , 2006 .

[23]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[24]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[25]  Zoubin Ghahramani,et al.  Modeling Dyadic Data with Binary Latent Factors , 2006, NIPS.

[26]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[27]  Zoubin Ghahramani,et al.  Infinite Sparse Factor Analysis and Infinite Independent Components Analysis , 2007, ICA.

[28]  Emin Orhan Dirichlet Processes , 2012 .

[29]  Michael I. Jordan,et al.  An HDP-HMM for systems with state persistence , 2008, ICML '08.

[30]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[31]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[32]  Michael,et al.  On a Class of Bayesian Nonparametric Estimates : I . Density Estimates , 2008 .

[33]  Peter Cheeseman,et al.  Bayesian Methods for Adaptive Models , 2011 .

[34]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[35]  Yee Whye Teh,et al.  The Infinite Factorial Hidden Markov Model , 2008, NIPS.

[36]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[37]  Roland Langrock,et al.  Nonparametric hidden Markov models , 2013 .

[38]  Andrew Gordon Wilson,et al.  Generalised Wishart Processes , 2010, UAI.

[39]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[40]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[41]  Temple F. Smith Occam's razor , 1980, Nature.

[42]  Thomas L. Griffiths,et al.  The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..

[43]  M. McAleer,et al.  Multivariate Stochastic Volatility: A Review , 2006 .

[44]  N. Hjort Nonparametric Bayes Estimators Based on Beta Processes in Models for Life History Data , 1990 .

[45]  Yee Whye Teh,et al.  Beam sampling for the infinite hidden Markov model , 2008, ICML '08.

[46]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[47]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[48]  P. McCullagh,et al.  Gibbs fragmentation trees , 2007, 0704.0945.

[49]  Carl E. Rasmussen,et al.  Bayesian Monte Carlo , 2002, NIPS.

[50]  C. J-F,et al.  THE COALESCENT , 1980 .

[51]  Michael I. Jordan,et al.  An internal model for sensorimotor integration. , 1995, Science.

[52]  Zoubin Ghahramani,et al.  Pitman-Yor Diffusion Trees , 2011, UAI.

[53]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[54]  Arnold Zellner,et al.  [Optimal Information Processing and Bayes's Theorem]: Reply , 1988 .

[55]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[56]  William D. Penny,et al.  Bayesian Approaches to Gaussian Mixture Modeling , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Elizabeth T. Uldall,et al.  [m?m], etc. , 1954 .

[58]  Yee Whye Teh,et al.  Bayesian Agglomerative Clustering with Coalescents , 2007, NIPS.

[59]  D. Aldous Exchangeability and related topics , 1985 .

[60]  Ryan P. Adams,et al.  Learning the Structure of Deep Sparse Graphical Models , 2009, AISTATS.

[61]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[62]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[63]  James O. Berger,et al.  Ockham's Razor and Bayesian Analysis , 1992 .

[64]  Peter Orbanz,et al.  Construction of Nonparametric Bayesian Models from Parametric Bayes Equations , 2009, NIPS.

[65]  C. Gouriéroux,et al.  The Wishart Autoregressive Process of Multivariate Stochastic Volatility , 2009 .

[66]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[67]  D. Mackay,et al.  Bayesian methods for adaptive models , 1992 .

[68]  Oliver Pfaffel Wishart Processes , 2012, 1201.3256.

[69]  Y. Teh,et al.  Indian Buffet Processes with Power-law Behavior , 2009, NIPS.

[70]  Zoubin Ghahramani,et al.  Bayesian Time Series Models: Nonparametric hidden Markov models , 2011 .

[71]  J. Tenenbaum,et al.  Optimal Predictions in Everyday Cognition , 2006, Psychological science.

[72]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[73]  Roman Garnett,et al.  Bayesian Quadrature for Ratios , 2012, AISTATS.

[74]  Klaus Ritter,et al.  Bayesian numerical analysis , 2000 .

[75]  Zoubin Ghahramani,et al.  A note on the evidence and Bayesian Occam's razor , 2005 .

[76]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[77]  P Donnelly,et al.  Coalescents and genealogical structure under neutrality. , 1995, Annual review of genetics.

[78]  Zoubin Ghahramani,et al.  Scaling the iHMM: Parallelization versus Hadoop , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[79]  Yee Whye Teh,et al.  Modelling Genetic Variations using Fragmentation-Coagulation Processes , 2011, NIPS.

[80]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[81]  Lars Kai Hansen,et al.  Infinite multiple membership relational modeling for complex networks , 2011, 2011 IEEE International Workshop on Machine Learning for Signal Processing.

[82]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[83]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[84]  Zoubin Ghahramani,et al.  An Infinite Latent Attribute Model for Network Data , 2012, ICML.

[85]  A. Zellner Optimal Information Processing and Bayes's Theorem , 1988 .

[86]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[87]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[88]  S. Gupta,et al.  Statistical decision theory and related topics IV , 1988 .

[89]  S H Chung,et al.  Characterization of single channel currents using digital signal processing techniques based on Hidden Markov Models. , 1990, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.