High-Dimensional Multivariate Time Series With Additional Structure

ABSTRACT High-dimensional multivariate time series are challenging due to the dependent and high-dimensional nature of the data, but in many applications there is additional structure that can be exploited to reduce computing time along with statistical error. We consider high-dimensional vector autoregressive processes with spatial structure, a simple and common form of additional structure. We propose novel high-dimensional methods that take advantage of such structure without making model assumptions about how distance affects dependence. We provide nonasymptotic bounds on the statistical error of parameter estimators in high-dimensional settings and show that the proposed approach reduces the statistical error. An application to air pollution in the USA demonstrates that the estimation approach reduces both computing time and prediction error and gives rise to results that are meaningful from a scientific point of view, in contrast to high-dimensional methods that ignore spatial structure. In practice, these high-dimensional methods can be used to decompose high-dimensional multivariate time series into lower-dimensional multivariate time series that can be studied by other methods in more depth. Supplementary materials for this article are available online.

[1]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[2]  Noel Cressie,et al.  Spatio-Temporal Data Fusion for Very Large Remote Sensing Datasets , 2014, Technometrics.

[3]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[4]  Matthieu Stigler,et al.  Threshold cointegration: overview and implementation in R , 2013 .

[5]  Mark S Handcock,et al.  Local dependence in random graph models: characterization, properties and statistical inference , 2015, Journal of the American Statistical Association.

[6]  Y. Matsuda Graphical modelling for multivariate time series , 2004 .

[7]  H. Künsch The Jackknife and the Bootstrap for General Stationary Observations , 1989 .

[8]  A. Peters,et al.  Long-term air pollution exposure and cardio- respiratory mortality: a review , 2013, Environmental Health.

[9]  Andy South,et al.  rworldmap : a new R package for mapping global data , 2011, R J..

[10]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[11]  G. T. Wilson,et al.  Models for Dependent Time Series , 2015 .

[12]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[13]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[14]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[15]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[16]  C. Sbardella High dimensional regression , 2011 .

[17]  P. Bickel,et al.  Large Vector Auto Regressions , 2011, 1106.3915.

[18]  Katherine B. Ensor,et al.  A Case-Crossover Analysis of Out-of-Hospital Cardiac Arrest and Air Pollution , 2013, Circulation.

[19]  Steven K. Thompson,et al.  Sampling: Thompson/Sampling 3E , 2012 .

[20]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[21]  M. Ruiz Espejo Sampling , 2013, Encyclopedic Dictionary of Archaeology.

[22]  R. Tibshirani,et al.  Sparse estimation of a covariance matrix. , 2011, Biometrika.

[23]  Richard A. Davis,et al.  Towards estimating extremal serial dependence via the bootstrapped extremogram , 2012 .

[24]  Christian P. Robert,et al.  Statistics for Spatio-Temporal Data , 2014 .

[25]  Richard A. Davis,et al.  Sparse Vector Autoregressive Modeling , 2012, 1207.0520.

[26]  Igor G. Zurbenko,et al.  Space and Time Scales in Ambient Ozone Data , 1997 .

[27]  X. Zou,et al.  Traffic-related air pollution and lung cancer: A meta-analysis , 2015, Thoracic cancer.

[28]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[29]  Joseph P. Romano,et al.  The stationary bootstrap , 1994 .

[30]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[31]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[32]  G. Michailidis,et al.  Regularized estimation in sparse high-dimensional time series models , 2013, 1311.4175.

[33]  Noel A Cressie,et al.  Statistics for Spatio-Temporal Data , 2011 .