Locally averaged Bayesian Dirichlet metrics for learning the structure and the parameters of Bayesian networks

The marginal likelihood of the data computed using Bayesian score metrics is at the core of score+search methods when learning Bayesian networks from data. However, common formulations of those Bayesian score metrics rely on free parameters which are hard to assess. Recent theoretical and experimental works have also shown that the commonly employed BDe score metric is strongly biased by the particular assignments of its free parameter known as the equivalent sample size. This sensitivity means that poor choices of this parameter lead to inferred BN models whose structure and parameters do not properly represent the distribution generating the data even for large sample sizes. In this paper we argue that the problem is that the BDe metric is based on assumptions about the BN model parameters distribution assumed to generate the data which are too strict and do not hold in real settings. To overcome this issue we introduce here an approach that tries to marginalize the meta-parameter locally, aiming to embrace a wider set of assumptions about these parameters. It is shown experimentally that this approach offers a robust performance, as good as that of the standard BDe metric with an optimum selection of its free parameter and, in consequence, this method prevents the choice of wrong settings for this widely applied Bayesian score metric.

[1]  James G. Scott,et al.  Objective Bayesian model selection in Gaussian graphical models , 2009 .

[2]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[3]  Roy Kelner,et al.  Learning Bayesian network classifiers by risk minimization , 2012, Int. J. Approx. Reason..

[4]  Maomi Ueno,et al.  Learning networks determined by the ratio of prior and data , 2010, UAI.

[5]  Harald Steck,et al.  Learning the Bayesian Network Structure: Dirichlet Prior versus Data , 2008, UAI 2008.

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[8]  Serafín Moral,et al.  New Score for Independence Based on the Imprecise Dirichlet Model , 2005, ISIPTA.

[9]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[10]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[11]  Tomi Silander,et al.  On Sensitivity of the MAP Bayesian Network Structure to the Equivalent Sample Size Parameter , 2007, UAI.

[12]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[13]  Bernhard Schölkopf,et al.  Causal Inference on Discrete Data Using Additive Noise Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jirí Vomlel,et al.  A geometric view on learning Bayesian network structures , 2010, Int. J. Approx. Reason..

[15]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[16]  Tommi S. Jaakkola,et al.  On the Dirichlet Prior and Bayesian Regularization , 2002, NIPS.

[17]  Jirí Vomlel,et al.  On open questions in the geometric approach to structural learning Bayesian nets , 2011, Int. J. Approx. Reason..

[18]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[19]  Milan Studený,et al.  Characteristic imsets for learning Bayesian network structure , 2012, Int. J. Approx. Reason..

[20]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[21]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[22]  H. Akaike A new look at the statistical model identification , 1974 .