Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks

In many multivariate domains, we are interested in analyzing the dependency structure of the underlying distribution, e.g., whether two variables are in direct interaction. We can represent dependency structures using Bayesian network models. To analyze a given data set, Bayesian model selection attempts to find the most likely (MAP) model, and uses its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed order over network variables. This allows us to compute, for a given order, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses a Markov Chain Monte Carlo (MCMC) method, but over orders rather than over network structures. The space of orders is smaller and more regular than the space of structures, and has much a smoother posterior “landscape”. We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap approach.

[1]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[3]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[4]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[5]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[6]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[7]  Jun S. Liu,et al.  Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes , 1994 .

[8]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[9]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[10]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[11]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[12]  David Heckerman,et al.  Learning Bayesian Networks: A Unification for Discrete and Gaussian Domains , 1995, UAI.

[13]  Wray L. Buntine A Guide to the Literature on Learning Probabilistic Networks from Data , 1996, IEEE Trans. Knowl. Data Eng..

[14]  Pedro Larrañaga,et al.  Learning Bayesian network structures by searching for the best ordering with genetic algorithms , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[15]  G. Casella,et al.  Rao-Blackwellisation of sampling schemes , 1996 .

[16]  Kevin B. Korb,et al.  Causal Discovery via MML , 1996, ICML.

[17]  D. Madigan,et al.  Bayesian model averaging and model selection for markov equivalence classes of acyclic digraphs , 1996 .

[18]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[19]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[20]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[21]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[22]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.

[23]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[24]  P. Green,et al.  Decomposable graphical Gaussian model determination , 1999 .

[25]  E. Lander Array of hope , 1999, Nature Genetics.

[26]  Claudia Tarantola,et al.  Efficient Model Determination for Discrete Graphical Models , 2000 .

[27]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[28]  Yoram Singer,et al.  An Efficient Extension to Mixture Techniques for Prediction and Decision Trees , 1997, COLT '97.

[29]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[30]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[31]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[32]  D. Heckerman,et al.  A Bayesian Approach to Causal Discovery , 2006 .