Bayesian Modelling in Machine Learning: A Tutorial Review

Many facets of Bayesian Modelling are firmly established in Machine Learning and give rise to state-of-the-art solutions to application problems. The sheer number of techniques, ideas and models which have been proposed, and the terminology, can be bewildering. With this tutorial review, we aim to give a wide high-level overview over this important field, concentrating on central ideas and methods, and on their interconnections. The reader will gain a basic understanding of the topics and their relationships, armed with which she can branch to details of her interest using the references to more specialized textbooks and reviews we provide here.

[1]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[4]  Robert J. McEliece,et al.  The generalized distributive law , 2000, IEEE Trans. Inf. Theory.

[5]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[6]  Ali Esmaili,et al.  Probability and Random Processes , 2005, Technometrics.

[7]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[8]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[9]  J. Davenport Editor , 1960 .

[10]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[11]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[13]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[14]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[15]  Michael I. Jordan Graphical Models , 2003 .

[16]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  William H. Press,et al.  Numerical recipes in C , 2002 .

[19]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[20]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[21]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[22]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[23]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[24]  Matthias W. Seeger,et al.  Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers , 1999, NIPS.

[25]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[26]  Lakhmi C. Jain,et al.  Introduction to Bayesian Networks , 2008 .

[27]  P. Rousseeuw,et al.  Wiley Series in Probability and Mathematical Statistics , 2005 .

[28]  Walter R. Gilks,et al.  BUGS - Bayesian inference Using Gibbs Sampling Version 0.50 , 1995 .

[29]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[30]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[31]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[32]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[33]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[34]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[36]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[37]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[38]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[39]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[40]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[41]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[42]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[43]  M. E. Maron,et al.  Automatic Indexing: An Experimental Inquiry , 1961, JACM.

[44]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[45]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[46]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[47]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[48]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[49]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[50]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[51]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[52]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[54]  Wang,et al.  Nonuniversal critical dynamics in Monte Carlo simulations. , 1987, Physical review letters.

[55]  B. Rao,et al.  ℓâ‚€-norm Minimization for Basis Selection , 2004, NIPS 2004.

[56]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[57]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[58]  Neil D. Lawrence,et al.  Efficient Nonparametric Bayesian Modelling with Sparse Gaussian Process Approximations , 2006 .

[59]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[60]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[61]  Matthias W. Seeger,et al.  Bayesian Gaussian process models : PAC-Bayesian generalisation error bounds and sparse approximations , 2003 .

[62]  David Bruce Wilson,et al.  Exact sampling with coupled Markov chains and applications to statistical mechanics , 1996, Random Struct. Algorithms.

[63]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[64]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[65]  G. Grimmett,et al.  Probability and random processes , 2002 .

[66]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[67]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[68]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[69]  L. Goddard Information Theory , 1962, Nature.

[70]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[71]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[72]  Martin J. Wainwright,et al.  Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles , 2000, NIPS.

[73]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[74]  Bhaskar D. Rao,et al.  L_0-norm Minimization for Basis Selection , 2004, NIPS.

[75]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[76]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[77]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[78]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[79]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[80]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[81]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[82]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[83]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .