Replacing Causal Faithfulness with Algorithmic Independence of Conditionals

Independence of Conditionals (IC) has recently been proposed as a basic rule for causal structure learning. If a Bayesian network represents the causal structure, its Conditional Probability Distributions (CPDs) should be algorithmically independent. In this paper we compare IC with causal faithfulness (FF), stating that only those conditional independences that are implied by the causal Markov condition hold true. The latter is a basic postulate in common approaches to causal structure learning. The common spirit of FF and IC is to reject causal graphs for which the joint distribution looks ‘non-generic’. The difference lies in the notion of genericity: FF sometimes rejects models just because one of the CPDs is simple, for instance if the CPD describes a deterministic relation. IC does not behave in this undesirable way. It only rejects a model when there is a non-generic relation between different CPDs although each CPD looks generic when considered separately. Moreover, it detects relations between CPDs that cannot be captured by conditional independences. IC therefore helps in distinguishing causal graphs that induce the same conditional independences (i.e., they belong to the same Markov equivalence class). The usual justification for FF implicitly assumes a prior that is a probability density on the parameter space. IC can be justified by Solomonoff’s universal prior, assigning non-zero probability to those points in parameter space that have a finite description. In this way, it favours simple CPDs, and therefore respects Occam’s razor. Since Kolmogorov complexity is uncomputable, IC is not directly applicable in practice. We argue that it is nevertheless helpful, since it has already served as inspiration and justification for novel causal inference algorithms.

[1]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[2]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[3]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[4]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[5]  G. Chaitin A Theory of Program Size Formally Identical to Information Theory , 1975, JACM.

[6]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[7]  Christopher Meek,et al.  Strong completeness and faithfulness in Bayesian networks , 1995, UAI.

[8]  Michael I. Jordan Graphical Models , 1998 .

[9]  N. Cartwright The dappled world : a study of the boundaries of science , 1999 .

[10]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[11]  Péter Gács,et al.  Algorithmic statistics , 2000, IEEE Trans. Inf. Theory.

[12]  R. Solomonoff A PRELIMINARY REPORT ON A GENERAL THEORY OF INDUCTIVE INFERENCE , 2001 .

[13]  Nancy Cartwright,et al.  Against Modularity, the Causal Markov Condition, and Any Link Between the Two: Comments on Hausman and Woodward , 2002, The British Journal for the Philosophy of Science.

[14]  S. Lauritzen,et al.  Chain graph models and their causal interpretations , 2002 .

[15]  R. Giere The Dappled World: A Study of the Boundaries of Science , 2003 .

[16]  J. Lemeire,et al.  Causal Models as Minimal Descriptions of Multivariate Systems , 2006 .

[17]  Kevin B. Korb,et al.  The power of intervention , 2006, Minds and Machines.

[18]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[19]  Marcus Hutter,et al.  On Universal Prediction and Bayesian Confirmation , 2007, Theor. Comput. Sci..

[20]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[21]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[22]  B. Schoelkopf,et al.  Distinguishing Cause and Effect via Second Order Exponential Models , 2009, 0910.5561.

[23]  Dominik Janzing,et al.  Justifying Additive Noise Model-Based Causal Discovery via Algorithmic Information Theory , 2009, Open Syst. Inf. Dyn..

[24]  Bernhard Schölkopf,et al.  Telling cause from effect based on high-dimensional observations , 2009, ICML.

[25]  Bernhard Schölkopf,et al.  Inferring deterministic causal relations , 2010, UAI.

[26]  Jiji Zhang,et al.  Intervention, determinism, and the causal minimality condition , 2011, Synthese.

[27]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[28]  Dominik Janzing,et al.  Testing whether linear equations are causal: A free probability theory approach , 2011, UAI.

[29]  Marcus Hutter,et al.  A Philosophical Treatise of Universal Induction , 2011, Entropy.

[30]  Bernhard Schölkopf,et al.  Causal Inference on Discrete Data Using Additive Noise Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Abdellah Touhafi,et al.  When are graphical causal models not good models , 2011 .

[32]  Jan Lemeire,et al.  Inferring the causal decomposition under the presence of deterministic relations , 2011, ESANN.

[33]  Jon Williamson,et al.  Causality in the Sciences , 2011 .

[34]  Bernhard Schölkopf,et al.  Identifiability of Causal Graphs using Functional Models , 2011, UAI.

[35]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..