Using Markov Blankets for Causal Structure Learning

We show how a generic feature-selection algorithm returning strongly relevant variables can be turned into a causal structure-learning algorithm. We prove this under the Faithfulness assumption for the data distribution. In a causal graph, the strongly relevant variables for a node X are its parents, children, and children's parents (or spouses), also known as the Markov blanket of X. Identifying the spouses leads to the detection of the V-structure patterns and thus to causal orientations. Repeating the task for all variables yields a valid partially oriented causal graph. We first show an efficient way to identify the spouse links. We then perform several experiments in the continuous domain using the Recursive Feature Elimination feature-selection algorithm with Support Vector Regression and empirically verify the intuition of this direct (but computationally expensive) approach. Within the same framework, we then devise a fast and consistent algorithm, Total Conditioning (TC), and a variant, TCbw, with an explicit backward feature-selection heuristics, for Gaussian data. After running a series of comparative experiments on five artificial networks, we argue that Markov blanket algorithms such as TC/TCbw or Grow-Shrink scale better than the reference PC algorithm and provides higher structural accuracy.

[1]  A. Raveh On the use of the Inverse of the Correlation Matrix in Multivariate Data Analysis , 1985 .

[2]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  E. Ziegel Introduction to the Theory and Practice of Econometrics , 1989 .

[5]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[6]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[7]  Ewart R. Carson,et al.  A Model-Based Approach to Insulin Adjustment , 1991, AIME.

[8]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[9]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[10]  Christopher Meek,et al.  Causal inference and causal explanation with background knowledge , 1995, UAI.

[11]  J. Pearl Causal diagrams for empirical research , 1995 .

[12]  A. H. Murphy,et al.  Hailfinder: A Bayesian system for forecasting severe weather , 1996 .

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[15]  R Scheines,et al.  The TETRAD Project: Constraint Based Aids to Causal Model Specification. , 1998, Multivariate behavioral research.

[16]  J. Woodward,et al.  Independence, Invariance and the Causal Markov Condition , 1999, The British Journal for the Philosophy of Science.

[17]  Sebastian Thrun,et al.  Bayesian Network Induction via Local Neighborhoods , 1999, NIPS.

[18]  Richard Scheines,et al.  Causation, Prediction, and Search, Second Edition , 2000, Adaptive computation and machine learning.

[19]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[20]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[21]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[22]  Michael I. Jordan,et al.  Learning Graphical Models with Mercer Kernels , 2002, NIPS.

[23]  Constantin F. Aliferis,et al.  HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection , 2003, AMIA.

[24]  Constantin F. Aliferis,et al.  Towards Principled Feature Selection: Relevancy, Filters and Wrappers , 2003 .

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[26]  Constantin F. Aliferis,et al.  Time and sample efficient discovery of Markov blankets and direct causal relations , 2003, KDD '03.

[27]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[28]  Constantin F. Aliferis,et al.  A theoretical characterization of linear SVM-based feature selection , 2004, ICML '04.

[29]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.

[30]  R. Shibata,et al.  PARTIAL CORRELATION AND CONDITIONAL CORRELATION AS MEASURES OF CONDITIONAL INDEPENDENCE , 2004 .

[31]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[32]  Lawrence D. Fu A COMPARISON OF STATE-OF-THE-ART ALGORITHMS FOR LEARNING BAYESIAN NETWORK STRUCTURE FROM CONTINUOUS DATA , 2005 .

[33]  Jesper Tegnér,et al.  Scalable, Efficient and Correct Learning of Markov Boundaries Under the Faithfulness Assumption , 2005, ECSQARU.

[34]  Dimitris Margaritis,et al.  Distribution-Free Learning of Bayesian Network Structure in Continuous Domains , 2005, AAAI.

[35]  D. Hardin,et al.  Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables , 2006 .

[36]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[37]  Daniel Steel,et al.  Homogeneity, selection, and the faithfulness condition , 2006, Minds and Machines.

[38]  Jesper Tegnér,et al.  Consistent Feature Selection for Pattern Recognition in Polynomial Time , 2007, J. Mach. Learn. Res..

[39]  André Elisseeff,et al.  A Partial Correlation-Based Algorithm for Causal Structure Discovery with Continuous Variables , 2007, IDA.

[40]  Constantin F. Aliferis,et al.  Causal Feature Selection , 2007 .