From dependency to causality: a machine learning approach

The relationship between statistical dependency and causality lies at the heart of all statistical approaches to causal inference. Recent results in the ChaLearn cause-effect pair challenge have shown that causal directionality can be inferred with good accuracy also in Markov indistinguishable configurations thanks to data driven approaches. This paper proposes a supervised machine learning approach to infer the existence of a directed causal link between two variables in multivariate settings with $n>2$ variables. The approach relies on the asymmetry of some conditional (in)dependence relations between the members of the Markov blankets of two variables causally connected. Our results show that supervised learning methods may be successfully used to extract causal information on the basis of asymmetric statistical descriptors also for $n>2$ variate distributions.

[1]  D. Margaritis Learning Bayesian Network Model Structure from Data , 2003 .

[2]  Gianluca Bontempi,et al.  Causal filter selection in microarray data , 2010, ICML.

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  André Elisseeff,et al.  Using Markov Blankets for Causal Structure Learning , 2008, J. Mach. Learn. Res..

[5]  Dan Geiger,et al.  Identifying independence in bayesian networks , 1990, Networks.

[6]  Benjamin Haibe-Kains,et al.  Multiple-input multiple-output causal strategies for gene selection , 2011, BMC Bioinformatics.

[7]  Peter Bühlmann,et al.  Causal Inference Using Graphical Models with the R Package pcalg , 2012 .

[8]  Mauro Birattari,et al.  Lazy learning for modeling and control design , 1997 .

[9]  Søren Højsgaard,et al.  A common platform for graphical models in R , 2005 .

[10]  J. Pearl Causal diagrams for empirical research , 1995 .

[11]  Bernhard Schölkopf,et al.  Inferring deterministic causal relations , 2010, UAI.

[12]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[13]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[14]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[15]  Claudia Baier Direction Of Time , 2016 .

[16]  Tom Heskes,et al.  A Logical Characterization of Constraint-Based Causal Discovery , 2011, UAI.

[17]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[18]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[19]  Qiang Shen,et al.  Methods to accelerate the learning of bayesian network structures , 2007 .

[20]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[21]  Gianluca Bontempi,et al.  Information‐Theoretic Gene Selection In Expression Data , 2013 .

[22]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[23]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Olivier Pourret,et al.  Bayesian networks : a practical guide to applications , 2008 .

[25]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[26]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[27]  Mark W. Schmidt,et al.  Learning Graphical Model Structure Using L1-Regularization Paths , 2007, AAAI.

[28]  Constantin F. Aliferis,et al.  Causal Explorer: A Causal Probabilistic Network Learning Toolkit for Biomedical Discovery , 2003, METMBS.

[29]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[30]  Jiji Zhang,et al.  Causal Reasoning with Ancestral Graphs , 2008, J. Mach. Learn. Res..

[31]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..

[32]  M. Birattari,et al.  Lazy learning for local modelling and control design , 1999 .

[33]  Mikael Henaff,et al.  New methods for separating causes from effects in genomics data , 2012, BMC Genomics.

[34]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[35]  Bernhard Schölkopf,et al.  Probabilistic latent variable models for distinguishing between cause and effect , 2010, NIPS.

[36]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[37]  P. Deb Finite Mixture Models , 2008 .

[38]  Constantin F. Aliferis,et al.  Time and sample efficient discovery of Markov blankets and direct causal relations , 2003, KDD '03.

[39]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[40]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[41]  Linda C. van der Gaag,et al.  Probabilistic Graphical Models , 2014, Lecture Notes in Computer Science.

[42]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part II: Analysis and Extensions , 2010, J. Mach. Learn. Res..