Causal Inference Using the Algorithmic Markov Condition

Inferring the causal structure that links n observables is usually based upon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when the sample size is one. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information and describe the corresponding causal inference rules. We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work. We also sketch some ideas on how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution.

[1]  Franz von Kutschera,et al.  Causation , 1993, J. Philos. Log..

[2]  Bernhard Schölkopf,et al.  Causal reasoning by evaluating the complexity of conditional densities with kernel methods , 2008, Neurocomputing.

[3]  Aleksandar Milosavljevic,et al.  Discovery by Minimal Length Encoding: A case study in molecular evolution , 1993, Machine Learning.

[4]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[5]  Zaher Dawy,et al.  DNA Classification Using Mutual Information Based Compression Distance Measures , 2005 .

[6]  Y. Kano,et al.  Causal Inference Using Nonnormality , 2004 .

[7]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[8]  Joris M. Mooij,et al.  Distinguishing between cause and effect , 2008, NIPS 2008.

[9]  Bernhard Schölkopf,et al.  Detecting the direction of causal time series , 2009, ICML '09.

[10]  Roland Omnès The Interpretation of Quantum Mechanics , 1987 .

[11]  D. Heckerman,et al.  A Bayesian Approach to Causal Discovery , 2006 .

[12]  Stéphane Grumbach,et al.  A New Challenge for Compression Algorithms: Genetic Sequences , 1994, Inf. Process. Manag..

[13]  M. M. Hassan Mahmud,et al.  On universal transfer learning , 2007, Theor. Comput. Sci..

[14]  Illtyd Trethowan Causality , 1938 .

[15]  Thomas Beth,et al.  ON THE POTENTIAL INFLUENCE OF QUANTUM NOISE ON MEASURING EFFECTIVENESS IN CLINICAL TRIALS , 2006 .

[16]  Charles H. Bennett Logical depth and physical complexity , 1988 .

[17]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[18]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[19]  J. Woodward,et al.  Scientific Explanation and the Causal Structure of the World , 1988 .

[20]  D. Deutsch Quantum theory, the Church–Turing principle and the universal quantum computer , 1985, Proceedings of the Royal Society of London. A. Mathematical and Physical Sciences.

[21]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[22]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[23]  Steffen L. Lauritzen,et al.  Independence properties of directed markov fields , 1990, Networks.

[24]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[25]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[26]  Marcus Hutter,et al.  On Universal Prediction and Bayesian Confirmation , 2007, Theor. Comput. Sci..

[27]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[28]  Howard Mark Wiseman,et al.  Complementarity between extractable mechanical work, accessible entanglement, and ability to act as a reference frame, under arbitrary superselection rules , 2005 .

[29]  Péter Gács,et al.  Algorithmic statistics , 2000, IEEE Trans. Inf. Theory.

[30]  J. Lemeire,et al.  Causal Models as Minimal Descriptions of Multivariate Systems , 2006 .

[31]  Péter Gács,et al.  Information Distance , 1998, IEEE Trans. Inf. Theory.

[32]  R. Solomonoff A PRELIMINARY REPORT ON A GENERAL THEORY OF INDUCTIVE INFERENCE , 2001 .

[33]  Jörn Müller-Quade,et al.  On Modeling IND-CCA Security in Cryptographic Protocols , 2003, IACR Cryptol. ePrint Arch..

[34]  H. S. Allen The Quantum Theory , 1928, Nature.

[35]  A. Brudno Entropy and the complexity of the trajectories of a dynamical system , 1978 .

[36]  G. J. Chaltin,et al.  To a mathematical definition of 'life' , 1970, SIGA.

[37]  Christopher Meek,et al.  Strong completeness and faithfulness in Bayesian networks , 1995, UAI.

[38]  Paul M. B. Vitányi,et al.  Kolmogorov Complexity and Information Theory. With an Interpretation in Terms of Questions and Answers , 2003, J. Log. Lang. Inf..

[39]  G. Chaitin,et al.  TOWARD A MATHEMATICAL DEFINITION OF “ LIFE ” , 1979 .

[40]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[41]  Dominik Janzing,et al.  On the Entropy Production of Time Series with Unidirectional Linearity , 2009, 0908.1861.

[42]  Kris Steenhaut,et al.  Inference of Graphical Causal Models: Representing the Meaningful Information of Probability Distributions , 2008, NIPS Causality: Objectives and Assessment.

[43]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[44]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[45]  K. Ward PHILOSOPHY OF SPACE AND TIME , 1968 .

[46]  A. P. Dawid,et al.  Independence properties of directed Markov fields. Networks, 20, 491-505 , 1990 .

[47]  F. Rawlins The Philosophy of Space and Time , 1959 .

[48]  L. Keita Scientific Explanation and the Causal Structure of the World , 1990 .

[49]  Bin Ma,et al.  Chain letters & evolutionary histories. , 2003, Scientific American.

[50]  Rolf Herken,et al.  The Universal Turing Machine: A Half-Century Survey , 1992 .

[51]  Thomas Beth,et al.  Quasi-order of clocks and their synchronism and quantum bounds for copying timing information , 2003, IEEE Trans. Inf. Theory.

[52]  Aleksandar Milosavljevic,et al.  Discovery by minimal length encoding: A case study in molecular evolution , 2004, Machine Learning.

[53]  Charles H. Bennett,et al.  On the nature and origin of complexity in discrete, homogeneous, locally-interacting systems , 1986 .

[54]  Michael I. Jordan Graphical Models , 2003 .

[55]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[56]  Xin Chen,et al.  A compression algorithm for DNA sequences and its applications in genome comparison , 2000, RECOMB '00.

[57]  Bernhard Schölkopf,et al.  Causal Inference by Choosing Graphs with Most Plausible Markov Kernels , 2006, AI&M.

[58]  Gregory J. Chaitin,et al.  A recent technical report , 1974, SIGA.

[59]  V. K. Bel'nov,et al.  Transactions of the Moscow Mathematical Society , 1976 .

[60]  Dominik Janzing Quantum Thermodynamics with Missing Reference Frames: Decompositions of Free Energy Into Non-Increasing Components , 2005 .

[61]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[62]  Dominik Janzing,et al.  On causally asymmetric versions of Occam's Razor and their relation to thermodynamics , 2007, 0708.3411.

[63]  Thierry Paul,et al.  Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.

[64]  O. Penrose The Direction of Time , 1962 .