Causal deconvolution by algorithmic generative models

Complex behaviour emerges from interactions between objects produced by different generating mechanisms. Yet to decode their causal origin(s) from observations remains one of the most fundamental challenges in science. Here we introduce a universal, unsupervised and parameter-free model-oriented approach, based on the seminal concept and the first principles of algorithmic probability, to decompose an observation into its most likely algorithmic generative models. Our approach uses a perturbation-based causal calculus to infer model representations. We demonstrate its ability to deconvolve interacting mechanisms regardless of whether the resultant objects are bit strings, space–time evolution diagrams, images or networks. Although this is mostly a conceptual contribution and an algorithmic framework, we also provide numerical evidence evaluating the ability of our methods to extract models from data produced by discrete dynamical systems such as cellular automata and complex networks. We think that these separating techniques can contribute to tackling the challenge of causation, thus complementing statistically oriented approaches.Most machine learning approaches extract statistical features from data, rather than the underlying causal mechanisms. A different approach analyses information in a general way by extracting recursive patterns from data using generative models under the paradigm of computability and algorithmic information theory.

[1]  Schreiber,et al.  Measuring information transfer , 2000, Physical review letters.

[2]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Ofi rNw8x'pyzm,et al.  The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions , 2002 .

[4]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[5]  James P. Crutchfield,et al.  Computational Mechanics: Pattern and Prediction, Structure and Simplicity , 1999, ArXiv.

[6]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[7]  Hector Zenil,et al.  Rule Primality, Minimal Generating Sets and Turing-Universality in the Causal Decomposition of Elementary Cellular Automata , 2018, J. Cell. Autom..

[8]  Hector Zenil,et al.  Coding-theorem like behaviour and emergence of the universal distribution from resource-bounded algorithmic probability , 2017, Int. J. Parallel Emergent Distributed Syst..

[9]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[10]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[11]  Robert P. Daley Minimal-program complexity of pseudo-recursive and pseudo-random sequences , 1975, Mathematical systems theory.

[12]  Péter Gács,et al.  Information Distance , 1998, IEEE Trans. Inf. Theory.

[13]  Hector Zenil,et al.  Rule Primality, Minimal Generating Sets, Turing-Universality and Causal Decomposition in Elementary Cellular Automata , 2018, 1802.08769.

[14]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[15]  Jean-Paul Delahaye,et al.  Numerical evaluation of algorithmic complexity for short strings: A glance into the innermost structure of randomness , 2011, Appl. Math. Comput..

[16]  R. J. Solomon Off,et al.  The time scale of artificial intelligence: Reflections on social effects , 1985 .

[17]  E A Smith,et al.  Limits to understanding? , 1975, Science.

[18]  Dr. Marcus Hutter,et al.  Universal artificial intelligence , 2004 .

[19]  James P. Crutchfield,et al.  Bayesian Structural Inference for Hidden Processes , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[21]  Nikhil Srivastava,et al.  Graph sparsification by effective resistances , 2008, SIAM J. Comput..

[22]  Joseph T. Lizier,et al.  Information Decomposition of Target Effects from Multi-Source Interactions: Perspectives on Previous, Current and Future Work , 2018, Entropy.

[23]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[24]  Jean-Paul Delahaye,et al.  Two-Dimensional Kolmogorov Complexity and Validation of the Coding Theorem Method by Compressibility , 2012, ArXiv.

[25]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[26]  David R. Karger,et al.  Approximating s – t Minimum Cuts in ~ O(n 2 ) Time , 2007 .

[27]  José Hernández-Orallo,et al.  UNIVERSAL AND COGNITIVE NOTIONS OF ‘PART’ , 2003 .

[28]  Robin A. A. Ince Measuring multivariate redundant information with pointwise common change in surprisal , 2016, Entropy.

[29]  Satosi Watanabe,et al.  PATTERN RECOGNITION AS INFORMATION COMPRESSION , 1972 .

[30]  Hector Zenil,et al.  A Decomposition Method for Global Evaluation of Shannon Entropy and Local Estimations of Algorithmic Complexity , 2016, Entropy.

[31]  Shang-Hua Teng,et al.  Spectral Sparsification of Graphs , 2008, SIAM J. Comput..

[32]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[33]  Jean-Paul Delahaye,et al.  Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines , 2012, PloS one.

[34]  Randall D. Beer,et al.  Nonnegative Decomposition of Multivariate Information , 2010, ArXiv.

[35]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[36]  Jean-Paul Delahaye,et al.  Two-dimensional Kolmogorov complexity and an empirical validation of the Coding theorem method by compressibility , 2012, PeerJ Comput. Sci..

[37]  Lars Hertzberg The Limits of Understanding , 2005 .

[38]  K. Zweig Milo et al. (2002): Network Motifs: Simple Building Blocks of Complex Networks , 2018, Schlüsselwerke der Netzwerkforschung.

[39]  Hector Zenil,et al.  An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems , 2017, bioRxiv.

[40]  Hector Zenil,et al.  Data Dimension Reduction and Network Sparsification Based on Minimal Algorithmic Information Loss , 2018 .

[41]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications, Third Edition , 1997, Texts in Computer Science.

[42]  José Hernández-Orallo,et al.  Thesis: Computational measures of information gain and reinforcement in inference processes , 2000, AI Commun..

[43]  Bolian Liu,et al.  Graphs determined by their (signless) Laplacian spectra , 2011 .

[44]  Hector Zenil,et al.  Symmetry and Correspondence of Algorithmic Complexity over Geometric, Spatial and Topological Representations † , 2018, Entropy.

[45]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1993, Graduate Texts in Computer Science.