Causal Discovery via Reproducing Kernel Hilbert Space Embeddings

Causal discovery via the asymmetry between the cause and the effect has proved to be a promising way to infer the causal direction from observations. The basic idea is to assume that the mechanism generating the cause distribution p(x) and that generating the conditional distribution p(y|x) correspond to two independent natural processes and thus p(x) and p(y|x) fulfill some sort of independence condition. However, in many situations, the independence condition does not hold for the anticausal direction; if we consider p(x, y) as generated via p(y)p(x|y), then there are usually some contrived mutual adjustments between p(y) and p(x|y). This kind of asymmetry can be exploited to identify the causal direction. Based on this postulate, in this letter, we define an uncorrelatedness criterion between p(x) and p(y|x) and, based on this uncorrelatedness, show asymmetry between the cause and the effect in terms that a certain complexity metric on p(x) and p(y|x) is less than the complexity metric on p(y) and p(x|y). We propose a Hilbert space embedding-based method EMD (an abbreviation for EMbeDding) to calculate the complexity metric and show that this method preserves the relative magnitude of the complexity metric. Based on the complexity metric, we propose an efficient kernel-based algorithm for causal discovery. The contribution of this letter is threefold. It allows a general transformation from the cause to the effect involving the noise effect and is applicable to both one-dimensional and high-dimensional data. Furthermore it can be used to infer the causal ordering for multiple variables. Extensive experiments on simulated and real-world data are conducted to show the effectiveness of the proposed method.

[1]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[2]  Yoshinobu Kawahara,et al.  GroupLiNGAM: Linear non-Gaussian acyclic models for sets of variables , 2010, ArXiv.

[3]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[4]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[5]  Zhitang Chen,et al.  Causality in Linear Nongaussian Acyclic Models in the Presence of Latent Gaussian Confounders , 2013, Neural Computation.

[6]  Zhitang Chen,et al.  Nonlinear Causal Discovery for High Dimensional Data: A Kernelized Trace Method , 2013, 2013 IEEE 13th International Conference on Data Mining.

[7]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[8]  R. Speicher Free Probability Theory , 1996, Oberwolfach Reports.

[9]  Bernhard Schölkopf,et al.  Probabilistic latent variable models for distinguishing between cause and effect , 2010, NIPS.

[10]  Bernhard Schölkopf,et al.  Inferring deterministic causal relations , 2010, UAI.

[11]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[12]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[13]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..

[14]  Dominik Janzing,et al.  Testing whether linear equations are causal: A free probability theory approach , 2011, UAI.

[15]  Alexandru Nica,et al.  Free random variables , 1992 .

[16]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[17]  Alexander J. Smola,et al.  Hilbert space embeddings of conditional distributions with applications to dynamical systems , 2009, ICML '09.

[18]  Patrik O. Hoyer,et al.  Discovering Unconfounded Causal Relationships Using Linear Non-Gaussian Models , 2010, JSAI-isAI Workshops.

[19]  Aapo Hyvärinen,et al.  DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model , 2011, J. Mach. Learn. Res..

[20]  Bernhard Schölkopf,et al.  Telling cause from effect based on high-dimensional observations , 2009, ICML.

[21]  Bernhard Schölkopf,et al.  Regression by dependence minimization and its application to causal inference in additive noise models , 2009, ICML '09.

[22]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[23]  Patrik O. Hoyer,et al.  Estimation of causal effects using linear non-Gaussian causal models with hidden variables , 2008, Int. J. Approx. Reason..