Variational Message Passing and its Applications

This thesis is concerned with the development of Variational Message Passing (VMP), an algorithm for automatically performing variational inference in a probabilistic graphical model. VMP allows learning and reasoning about a system to proceed directly from a given probabilistic model of that system. The utility of VMP has been demonstrated by solving problems in the domains of machine vision and bioinformatics. VMP dramatically simplifies the construction and testing of new variational models and readily allows a range of alternative models to be tested on a given problem. In chapter 1, a probabilistic approach to automatic learning and reasoning is introduced. Belief propagation, an existing exact inference algorithm that uses message passing in a graphical model, is outlined, along with its limitations. These limitations lead to the need for approximate inference methods, including sampling methods and variational inference. The latter method of variational inference, which provides an analytical approximation to the posterior distribution, is described in detail. Chapter 2 presents a novel framework for performing automatic variational inference in a wide range of probabilistic models. The core of the framework is the Variational Message Passing algorithm which is an analog of belief propagation that uses message passing within a graphical model to optimise an approximate variational distribution. A software package, called VIBES (Variational Inference in BayESian networks), is presented as an implementation of the VMP framework. A tutorial is included which demonstrates applying VIBES to a small data set. Chapter 3 sees the framework being applied to the problem of modelling non-linear image manifolds such as those of face images and digits images. In chapter 4, the problems of DNA microarray image analysis and gene expression modelling are addressed, again using the VMP framework. Chapter 5 extends Variational Message Passing by allowing variational distributions which retain part of the dependency structure of the original model. The resulting Structured VMP algorithm is shown to improve the quality of the approximate inference and hence widen the applicability of the framework. Conclusions and suggestions for future research directions are presented in Chapter 6.

[1]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[2]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[3]  Robert G. Gallager,et al.  Low-density parity-check codes , 1962, IRE Trans. Inf. Theory.

[4]  E H Shorthffe,et al.  Computer-based medical consultations mycin , 1976 .

[5]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[6]  B. Everitt,et al.  Finite Mixture Distributions , 1981 .

[7]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  David J. Spiegelhalter,et al.  Probabilistic Reasoning in Predictive Expert Systems , 1985, UAI.

[9]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[10]  J. J. Sakurai,et al.  Modern Quantum Mechanics , 1986 .

[11]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[12]  Judea Pearl,et al.  Evidential Reasoning Using Stochastic Simulation of Causal Models , 1987, Artif. Intell..

[13]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[14]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[15]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[16]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[17]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[18]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[19]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[20]  W. Gilks,et al.  Adaptive Rejection Sampling for Gibbs Sampling , 1992 .

[21]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[22]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[23]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[24]  Radford M. Neal A new view of the EM algorithm that justifies incremental and other variants , 1993 .

[25]  Michael Luby,et al.  Approximating Probabilistic Inference in Bayesian Belief Networks is NP-Hard , 1993, Artif. Intell..

[26]  Vishvjit S. Nalwa,et al.  A guided tour of computer vision , 1993 .

[27]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[28]  Adnan Darwiche Conditioning Methods for Exact and Approximate Inference in Causal Networks , 1995, UAI 1995.

[29]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[30]  Stephen M. Omohundro,et al.  Nonlinear manifold learning for visual speech recognition , 1995, Proceedings of IEEE International Conference on Computer Vision.

[31]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[32]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[33]  Michael J. Black,et al.  Recognizing facial expressions under rigid and non-rigid facial motions , 1995 .

[34]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[35]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[36]  K. Bathe Finite Element Procedures , 1995 .

[37]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[38]  David J. C. MacKay,et al.  Good Codes Based on Very Sparse Matrices , 1995, IMACC.

[39]  Niclas Wiberg,et al.  Codes and Decoding on General Graphs , 1996 .

[40]  Michael I. Jordan,et al.  Variational methods for inference and estimation in graphical models , 1997 .

[41]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[43]  Nanda Kambhatla,et al.  Dimension Reduction by Local Principal Component Analysis , 1997, Neural Computation.

[44]  David C. Hogg,et al.  Wormholes in shape space: tracking through discontinuous changes in shape , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[45]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[46]  David J. C. Mackay,et al.  Introduction to Monte Carlo Methods , 1998, Learning in Graphical Models.

[47]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[48]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[49]  Christopher M. Bishop,et al.  Bayesian PCA , 1998, NIPS.

[50]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[51]  G S Michaels,et al.  Cluster analysis and data visualization of large-scale gene expression data. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[52]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[53]  Christopher M. Bishop,et al.  A Hierarchical Latent Variable Model for Data Visualization , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[55]  Harri Lappalainen,et al.  Ensemble learning for independent component analysis , 1999 .

[56]  Baback Moghaddam,et al.  Principal manifolds and Bayesian subspaces for visual recognition , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[57]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[58]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[59]  Charles M. Bishop Variational principal components , 1999 .

[60]  P. Brown,et al.  DNA arrays for analysis of gene expression. , 1999, Methods in enzymology.

[61]  Brendan J. Frey,et al.  Transformed component analysis: joint estimation of spatial transformations and image components , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[62]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[63]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[65]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[66]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[67]  Jeremy Buhler,et al.  Dapple: Improved Techniques for Finding Spots on DNA Microarrays , 2000 .

[68]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[69]  Stephen J. Roberts,et al.  An ensemble learning approach to independent component analysis , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[70]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[71]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[72]  Ann-Marie Martoglio,et al.  Changes in Tumorigenesis- and Angiogenesis-related Gene Transcript Abundance Profiles in Ovarian Cancer Detected by Tailored High Density cDNA Arrays , 2000, Molecular medicine.

[73]  Geoffrey E. Hinton,et al.  SMEM Algorithm for Mixture Models , 1998, Neural Computation.

[74]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[75]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[76]  Christopher M. Bishop,et al.  Non-linear Bayesian Image Modelling , 2000, ECCV.

[77]  Wim Wiegerinck,et al.  Variational Approximations between Mean Field Theory and the Junction Tree Algorithm , 2000, UAI.

[78]  N. Lee,et al.  A concise guide to cDNA microarray analysis. , 2000, BioTechniques.

[79]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[80]  Tommi S. Jaakkola,et al.  Using Graphical Models and Genomic Expression Data to Statistically Validate Models of Genetic Regulatory Networks , 2000, Pacific Symposium on Biocomputing.

[81]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[82]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[83]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[84]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[85]  Masato Inoue,et al.  BLIND GENE CLASSIFICATION BASED ON ICA OF MICROARRAY DATA , 2001 .

[86]  J. W. Miskin,et al.  Ensemble Learning for Blind Source Separation , 2001 .

[87]  Tom Heskes,et al.  Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy , 2002, NIPS.

[88]  David J. Spiegelhalter,et al.  VIBES: A Variational Inference Engine for Bayesian Networks , 2002, NIPS.

[89]  David J. C. MacKay,et al.  A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer , 2002, Bioinform..

[90]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[91]  T. Heskes Stable Fixed Points of Loopy Belief Propagation Are Minima of the Bethe Free Energy , 2002 .

[92]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[93]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[94]  Michael I. Jordan,et al.  A generalized mean field algorithm for variational inference in exponential families , 2002, UAI.

[95]  Christopher M. Bishop,et al.  Structured Variational Distributions in VIBES , 2003, AISTATS.

[96]  Neil D. Lawrence,et al.  Reducing the variability in cDNA microarray image processing by Bayesian inference , 2004, Bioinform..

[97]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[98]  Yen-Wei Chen,et al.  Ensemble learning for independent component analysis , 2006, Pattern Recognit..