Distribution-Free Learning of Bayesian Network Structure

We present an independence-based method for learning Bayesian network (BN) structure without making any assumptions on the probability distribution of the domain. This is mainly useful for continuous domains. Even mixed continuous-categorical domains and structures containing vectorial variables can be handled. We address the problem by developing a non-parametric conditional independence test based on the so-called kernel dependence measure, which can be readily used by any existing independence-based BN structure learning algorithm. We demonstrate the structure learning of graphical models in continuous and mixed domains from real-world data without distributional assumptions. We also experimentally show that our test is a good alternative, in particular in case of small sample sizes, compared to existing tests, which can only be used in purely categorical or continuous domains.

[1]  Mtw,et al.  Computation, causation, and discovery , 2000 .

[2]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[3]  D. Edwards Introduction to graphical modelling , 1995 .

[4]  D. Geiger Graphoids: a qualitative framework for probabilistic inference , 1990 .

[5]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[6]  Dimitris Margaritis,et al.  Distribution-Free Learning of Bayesian Network Structure in Continuous Domains , 2005, AAAI.

[7]  Ross D. Shachter Probabilistic Inference and Influence Diagrams , 1988, Oper. Res..

[8]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[9]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[10]  Douglas G. Altman,et al.  Practical statistics for medical research , 1990 .

[11]  H. Halkin,et al.  Determinants of the renal clearance of digoxin , 1975, Clinical pharmacology and therapeutics.

[12]  Dimitris Margaritis,et al.  Distribution-Free Learning of Graphical Model Structure in Continuous Domains , 2004 .

[13]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Bernhard Schölkopf,et al.  A kernel-based causal learning algorithm , 2007, ICML '07.

[15]  M. Mouchart,et al.  Ignorable Common Information, Null sets and Basu’s First Theorem , 2005 .

[16]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[17]  Sebastian Thrun,et al.  A Bayesian Multiresolution Independence Test for Continuous Variables , 2001, UAI.

[18]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[19]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[20]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[21]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[22]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[23]  Jean-Pierre Florens,et al.  Elements of Bayesian Statistics , 1990 .

[24]  Daniel S. Nagin,et al.  Deterrence and incapacitation. , 1998 .

[25]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[26]  D. Heckerman,et al.  A Bayesian Approach to Causal Discovery , 2006 .