Detecting Nonlinear Causality in Multivariate Time Series with Sparse Additive Models

We propose a nonparametric method for detecting nonlinear causal relationship within a set of multidimensional discrete time series, by using sparse additive models (SpAMs). We show that, when the input to the SpAM is a $\beta$-mixing time series, the model can be fitted by first approximating each unknown function with a linear combination of a set of B-spline bases, and then solving a group-lasso-type optimization problem with nonconvex regularization. Theoretically, we characterize the oracle statistical properties of the proposed sparse estimator in function estimation and model selection. Numerically, we propose an efficient pathwise iterative shrinkage thresholding algorithm (PISTA), which tames the nonconvexity and guarantees linear convergence towards the desired sparse estimator with high probability.

[1]  Bernard S. Black,et al.  Shock-Based Causal Inference in Corporate Finance and Accounting Research , 2014 .

[2]  Haim H. Permuter,et al.  Universal Estimation of Directed Information , 2010, IEEE Transactions on Information Theory.

[3]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[4]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[5]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[6]  Jian Huang,et al.  A Selective Review of Group Selection in High-Dimensional Models. , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[7]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[8]  J. Horowitz,et al.  VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS. , 2010, Annals of statistics.

[9]  Jorma Rissanen,et al.  Measures of mutual and causal dependence between two time series , 1987, IEEE Trans. Inf. Theory.

[10]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[11]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[12]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[13]  Donatello Materassi,et al.  Topological identification in networks of dynamical systems , 2008, 2008 47th IEEE Conference on Decision and Control.

[14]  V. Koltchinskii,et al.  SPARSITY IN MULTIPLE KERNEL LEARNING , 2010, 1211.2998.

[15]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[16]  Jürgen Kurths,et al.  Identifying causal gateways and mediators in complex spatio-temporal systems , 2015, Nature Communications.

[17]  Bernhard Schölkopf,et al.  Causal Inference on Time Series using Restricted Structural Equation Models , 2013, NIPS.

[18]  Bernhard Schölkopf,et al.  Uncovering the Temporal Dynamics of Diffusion Networks , 2011, ICML.

[19]  Z. Q. John Lu,et al.  Nonlinear Time Series: Nonparametric and Parametric Methods , 2004, Technometrics.

[20]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[21]  R. Handel Probability in High Dimension , 2014 .

[22]  Todd P. Coleman,et al.  Directed Information Graphs , 2012, IEEE Transactions on Information Theory.

[23]  Craig Hiemstra,et al.  Testing for Linear and Nonlinear Granger Causality in the Stock Price-Volume Relation , 1994 .

[24]  Aapo Hyvärinen,et al.  Causal modelling combining instantaneous and lagged effects: an identifiable model based on non-Gaussianity , 2008, ICML '08.

[25]  Zhaoran Wang,et al.  OPTIMAL COMPUTATIONAL AND STATISTICAL RATES OF CONVERGENCE FOR SPARSE NONCONVEX LEARNING PROBLEMS. , 2013, Annals of statistics.

[26]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[27]  Fang Han,et al.  Transition Matrix Estimation in High Dimensional Time Series , 2013, ICML.

[28]  H. Harashima,et al.  A time‐series analysis method based on the directed transinformation , 1984 .

[29]  D. Sejdinovic,et al.  Detecting causal associations in large nonlinear time series datasets , 2018 .

[30]  Negar Kiyavash,et al.  Directed Information Graphs: A generalization of Linear Dynamical Graphs , 2014, 2014 American Control Conference.

[31]  E. Rio,et al.  Bernstein inequality and moderate deviations under strong mixing conditions , 2012, 1202.4777.

[32]  Garvesh Raskutti,et al.  Non-Parametric Sparse Additive Auto-Regressive Network Models , 2018, IEEE Transactions on Information Theory.

[33]  Larry A. Wasserman,et al.  The huge Package for High-dimensional Undirected Graph Estimation in R , 2012, J. Mach. Learn. Res..

[34]  Todd P. Coleman,et al.  Estimating the directed information to infer causal relationships in ensemble neural spike train recordings , 2010, Journal of Computational Neuroscience.

[35]  Dino Sejdinovic,et al.  Detecting and quantifying causal associations in large nonlinear time series datasets , 2017, Science Advances.

[36]  J. Lafferty,et al.  Sparse additive models , 2007, 0711.4555.

[37]  Xiaotong Shen,et al.  Local asymptotics for regression splines and confidence regions , 1998 .