Gaussian Processes for Independence Tests with Non-iid Data in Causal Inference

In applied fields, practitioners hoping to apply causal structure learning or causal orientation algorithms face an important question: which independence test is appropriate for my data? In the case of real-valued iid data, linear dependencies, and Gaussian error terms, partial correlation is sufficient. But once any of these assumptions is modified, the situation becomes more complex. Kernel-based tests of independence have gained popularity to deal with nonlinear dependencies in recent years, but testing for conditional independence remains a challenging problem. We highlight the important issue of non-iid observations: when data are observed in space, time, or on a network, “nearby” observations are likely to be similar. This fact biases estimates of dependence between variables. Inspired by the success of Gaussian process regression for handling non-iid observations in a wide variety of areas and by the usefulness of the Hilbert-Schmidt Independence Criterion (HSIC), a kernel-based independence test, we propose a simple framework to address all of these issues: first, use Gaussian process regression to control for certain variables and to obtain residuals. Second, use HSIC to test for independence. We illustrate this on two classic datasets, one spatial, the other temporal, that are usually treated as iid. We show how properly accounting for spatial and temporal variation can lead to more reasonable causal graphs. We also show how highly structured data, like images and text, can be used in a causal inference framework using a novel structured input/output Gaussian process formulation. We demonstrate this idea on a dataset of translated sentences, trying to predict the source language.

[1]  R. Frisch,et al.  Partial Time Regressions as Compared with Individual Trends , 1933 .

[2]  P. Moran Notes on continuous stochastic phenomena. , 1950, Biometrika.

[3]  F. Downton,et al.  Time-Series Analysis. , 1961 .

[4]  C. Pearson,et al.  Handbook of Applied Mathematics , 1975 .

[5]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[6]  K. Salkauskas Some Relationships Between Surface Splines and Kriging , 1982 .

[7]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[8]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[9]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[10]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[11]  Otis W. Gilley,et al.  Using the Spatial Configuration of the Data to Improve Estimation , 1997 .

[12]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[13]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[14]  H. White,et al.  A Consistent Characteristic-Function-Based Test for Conditional Independence , 2003 .

[15]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[16]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[17]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[18]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[19]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[20]  P. Guttorp,et al.  Geostatistical Space-Time Models, Stationarity, Separability, and Full Symmetry , 2007 .

[21]  M. Schervish,et al.  On posterior consistency in nonparametric regression problems , 2007 .

[22]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[23]  Le Song,et al.  Kernel Measures of Independence for non-iid Data , 2008, NIPS.

[24]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[25]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[26]  Arthur Gretton,et al.  Nonlinear directed acyclic structure learning with weakly additive noise models , 2009, NIPS.

[27]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[28]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[29]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[30]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[31]  Noel A Cressie,et al.  Statistics for Spatio-Temporal Data , 2011 .

[32]  Harry van Zanten,et al.  Information Rates of Nonparametric Gaussian Process Methods , 2011, J. Mach. Learn. Res..

[33]  Patrik O. Hoyer,et al.  Causal Search in Structural Vector Autoregressive Models , 2009, NIPS Mini-Symposium on Causality in Time Series.

[34]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[35]  Peter Bühlmann,et al.  Causal Inference Using Graphical Models with the R Package pcalg , 2012 .

[36]  David R. Cox,et al.  Time Series Analysis , 2012 .

[37]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[38]  Aki Vehtari,et al.  GPstuff: Bayesian modeling with Gaussian processes , 2013, J. Mach. Learn. Res..

[39]  Barnabás Póczos,et al.  Scale Invariant Conditional Dependence Measures , 2013, ICML.

[40]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[41]  Bernhard Schölkopf,et al.  Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators , 2013, NIPS.

[42]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[43]  Arthur Gretton,et al.  A Kernel Independence Test for Random Processes , 2014, ICML.

[44]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[45]  Bernhard Schölkopf,et al.  A Permutation-Based Kernel Conditional Independence Test , 2014, UAI.

[46]  Joseph Ramsey,et al.  A Scalable Conditional Independence Test for Nonlinear, Non-Gaussian Data , 2014, ArXiv.

[47]  Bernhard Schölkopf,et al.  Consistency of Causal Inference under the Additive Noise Model , 2013, ICML.