A SYMPTOTIC C AUSAL I NFERENCE A P REPRINT

We investigate causal inference in the asymptotic regime as the number of variables n→∞ using an information-theoretic framework. We define structural entropy of a causal model in terms of its description complexity measured by the logarithmic growth rate, measured in bits, of all directed acyclic graphs (DAGs) on n variables, parameterized by the edge density d. Structural entropy yields non-intuitive predictions. If we randomly sample a DAG from the space of all models over n variables, as n→∞, in the range d ∈ (0, 1 8 ), almost surely D is a two-layer DAG! Semantic entropy quantifies the reduction in entropy where edges are removed by causal intervention. Semantic causal entropy is defined as the φ-divergence Dφ(P ‖ PS) between the observational distribution P and the interventional distribution PS , where a subset S of edges are intervened on to determine their causal influence. We compare the decomposability properties of semantic entropy for different choices of φ, including φ(t) = t log t (KL-divergence), φ = 12 ( √ t− 1) (squared Hellinger distance), and φ = 12 |t − 1| (total variation distance). We apply our framework to generalize a recently popular bipartite experimental design for studying causal inference on large datasets, where interventions are carried out on one set of variables (e.g., power plants, items in an online store), but outcomes are measured on a disjoint set of variables (residents near power plants, or shoppers). We generalize bipartite designs to k-partite designs, and describe an optimization framework for finding the optimal k-level DAG architecture for any value of d ∈ (0, 12 ). As d increases, a sequence of phase transitions occur over disjoint intervals of d, with deeper DAG architectures emerging as d→ 1 2 . We also give a quantitative bound on the number of samples needed to reliably test for average causal influence for a k-partite design.

[1]  B. Rothschild,et al.  Asymptotic enumeration of partial orders on a finite set , 1975 .

[2]  James L. Massey,et al.  Conservation of mutual and directed information , 2005, Proceedings. International Symposium on Information Theory, 2005. ISIT 2005..

[3]  D. J. Kleitman,et al.  A phase transition on partial orders , 1979 .

[4]  Frederick Eberhardt,et al.  Almost Optimal Intervention Sets for Causal Discovery , 2008, UAI.

[5]  Constantinos Daskalakis,et al.  GANs with Conditional Independence Graphs: On Subadditivity of Probability Divergences , 2021, AISTATS.

[6]  Palaniappan Kannappan,et al.  A Directed-Divergence Function of Type β , 1972, Inf. Control..

[7]  Karthikeyan Shanmugam,et al.  Experimental Design for Learning Causal Graphs with Latent Variables , 2017, NIPS.

[8]  Constantinos Daskalakis,et al.  Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing , 2016, COLT.

[9]  Deepak Dhar Entropy and phase transitions in partially ordered sets , 1978 .

[10]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[11]  Rerum Naturalium Phase transitions in the evolution of partially ordered sets , 1999 .

[12]  Anusch Taraz,et al.  Asymptotic enumeration, global structure, and constrained evolution , 2001, Discret. Math..

[13]  Vahab Mirrokni,et al.  Design and Analysis of Bipartite Experiments Under a Linear Exposure-response Model , 2021, EC.

[14]  Alan Frieze,et al.  Random Graphs with a Fixed Maximum Degree , 2020, SIAM J. Discret. Math..

[15]  Wilhelm Rödder,et al.  Bipartite Structures in Social Networks: Traditional versus Entropy-Driven Analyses , 2019, Entropy.

[16]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[17]  E. Szemerédi Regular Partitions of Graphs , 1975 .

[18]  Daniel Polani,et al.  Information Flows in Causal Networks , 2008, Adv. Complex Syst..

[19]  Constantinos Daskalakis,et al.  Learning and Testing Causal Models with Interventions , 2018, NeurIPS.

[20]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[21]  Timothy R. C. Read,et al.  Multinomial goodness-of-fit tests , 1984 .

[22]  Volker Roth,et al.  Information Theoretic Causal Effect Quantification , 2019, Entropy.

[23]  Richard Scheines,et al.  Causation, Prediction, and Search, Second Edition , 2000, Adaptive computation and machine learning.

[24]  Constantinos Daskalakis,et al.  Optimal Testing for Properties of Distributions , 2015, NIPS.

[25]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[26]  Yiming Li,et al.  Two-sided online bipartite matching in spatial data: experiments and analysis , 2019, GeoInformatica.

[27]  G. E. Noether,et al.  Asymptotic Properties of the Wald-Wolfowitz Test of Randomness , 1950 .

[28]  Cai Mao-cheng,et al.  On separating systems of graphs , 1984 .

[29]  Vahab Mirrokni,et al.  Variance Reduction in Bipartite Experiments through Correlation Clustering , 2019, NeurIPS.

[30]  Prasad Tadepalli,et al.  PAC Learning of Causal Trees with Latent Variables , 2021, AAAI.

[31]  Moritz Grosse-Wentrup,et al.  Quantifying causal influences , 2012, 1203.6502.

[32]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[33]  Sergio Verdú,et al.  Bounds among $f$-divergences , 2015, ArXiv.

[34]  Deepak Dhar Asymptotic enumeration of partially ordered sets. , 1980 .

[35]  Anusch Taraz,et al.  Phase Transitions in the Evolution of Partial Orders , 2001, J. Comb. Theory, Ser. A.

[36]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[37]  R. W. Robinson Counting unlabeled acyclic digraphs , 1977 .

[38]  L. Mirsky A Dual of Dilworth's Decomposition Theorem , 1971 .

[39]  Phokion G. Kolaitis,et al.  Phase transitions of PP-complete satisfiability problems , 2001, Discret. Appl. Math..

[40]  Brendan D. McKay,et al.  Posets on up to 16 Points , 2002, Order.

[41]  Rainer Schlosser,et al.  Dynamic Pricing under Competition on Online Marketplaces: A Data-Driven Approach , 2018, KDD.

[42]  Rainer Schlosser,et al.  Data-Driven Inventory Management and Dynamic Pricing Competition on Online Marketplaces , 2018, IJCAI.

[43]  Nikhil R. Devanur,et al.  Fast algorithms for finding matchings in lopsided bipartite graphs with applications to display ads , 2010, EC '10.

[44]  Maxim Raginsky,et al.  Directed information and pearl's causal calculus , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[45]  Peter Bühlmann,et al.  Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs (Abstract) , 2011, UAI.

[46]  Corwin M Zigler,et al.  Bipartite Causal Inference with Interference. , 2018, Statistical science : a review journal of the Institute of Mathematical Statistics.

[47]  A. Taraz,et al.  Phase transitions in the evolution of partially ordered sets , 1999 .