Multivariate spatio-temporal models for high-dimensional areal data with application to Longitudinal Employer-Household Dynamics

Many data sources report related variables of interest that are also referenced over geographic regions and time; however, there are relatively few general statistical methods that one can readily use that incorporate these multivariate spatio-temporal dependencies. Additionally, many multivariate spatio-temporal areal data sets are extremely high dimensional, which leads to practical issues when formulating statistical models. For example, we analyze Quarterly Workforce Indicators (QWI) published by the US Census Bureau's Longitudinal Employer-Household Dynamics (LEHD) program. QWIs are available by different variables, regions, and time points, resulting in millions of tabulations. Despite their already expansive coverage, by adopting a fully Bayesian framework, the scope of the QWIs can be extended to provide estimates of missing values along with associated measures of uncertainty. Motivated by the LEHD, and other applications in federal statistics, we introduce the multivariate spatio-temporal mixed effects model (MSTM), which can be used to efficiently model high-dimensional multivariate spatio-temporal areal data sets. The proposed MSTM extends the notion of Moran's I basis functions to the multivariate spatio-temporal setting. This extension leads to several methodological contributions, including extremely effective dimension reduction, a dynamic linear model for multivariate spatio-temporal areal processes, and the reduction of a high-dimensional parameter space using a novel parameter model.

[1]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[2]  Robert H. Shumway,et al.  Time series analysis and its applications : with R examples , 2017 .

[3]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[4]  J. Andrew Royle,et al.  A Hierarchical Spatial Model for Constructing Wind Fields from Scatterometer Data in the Labrador Sea , 1999 .

[5]  Roger G. Jones,et al.  Best Linear Unbiased Estimators for Repeated Surveys , 1980 .

[6]  Christopher K. Wikle,et al.  Low-Rank Representations for Spatial Processes , 2010 .

[7]  S. Frühwirth-Schnatter Data Augmentation and Dynamic Linear Models , 1994 .

[8]  D. Dey,et al.  A First Course in Linear Model Theory , 2001 .

[9]  Paul Newbold,et al.  The time series approach to econometric model building , 2001 .

[10]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[11]  Ben Zipperer,et al.  Credible Research Designs for Minimum Wage Studies , 2017, SSRN Electronic Journal.

[12]  Jonathan R. Bradley,et al.  A comparison of spatial predictors when datasets could be very large , 2014, 1410.7748.

[13]  Moshe Feder Time Series Analysis of Repeated Surveys: The State–space Approach , 2001 .

[14]  P. Guttorp,et al.  Nonparametric Estimation of Nonstationary Spatial Covariance Structure , 1992 .

[15]  N. Cressie,et al.  Classes of nonseparable, spatio-temporal stationary covariance functions , 1999 .

[16]  J. Zhu,et al.  Generalized Linear Latent Variable Models for Repeated Measures of Spatially Correlated Multivariate Data , 2005, Biometrics.

[17]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[18]  Jeffrey P. Thompson,et al.  Using Local Labor Market Data to Re-Examine the Employment Effects of the Minimum Wage , 2009 .

[19]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[20]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[21]  T. Gneiting Correlation functions for atmospheric data analysis , 1999 .

[22]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[23]  M. Stein Space–Time Covariance Functions , 2005 .

[24]  Anthony N. Pettitt,et al.  A Conditional Autoregressive Gaussian Process for Irregularly Spaced Multivariate Data with Application to Modelling Large Sets of Binary Data , 2002, Stat. Comput..

[25]  Andrew O. Finley,et al.  Hierarchical Spatial Process Models for Multiple Traits in Large Genetic Trials , 2010, Journal of the American Statistical Association.

[26]  Aaron T. Porter,et al.  Small Area Estimation via Multivariate Fay–Herriot Models with Latent Spatial Dependence , 2013, 1310.7211.

[27]  Andrew O. Finley,et al.  Improving the performance of predictive process modeling for large datasets , 2009, Comput. Stat. Data Anal..

[28]  Noel A Cressie,et al.  Selection of rank and basis functions in the Spatial Random Effects Model , 2011 .

[29]  Noel A Cressie,et al.  Sampling designs and prediction methods for Gaussian spatial processes , 1999 .

[30]  Ying Sun,et al.  Geostatistics for Large Datasets , 2012 .

[31]  Lars Vilhuber,et al.  Differential Privacy Applications to Bayesian and Linear Mixed Model Estimation , 2013, J. Priv. Confidentiality.

[32]  Richard A. Frey,et al.  Statistical modeling of MODIS cloud data using the spatial random effects model , 2013 .

[33]  N. Higham COMPUTING A NEAREST SYMMETRIC POSITIVE SEMIDEFINITE MATRIX , 1988 .

[34]  V. Zadnik,et al.  Effects of Residual Smoothing on the Posterior of the Fixed Effects in Disease‐Mapping Models , 2006, Biometrics.

[35]  Lars Vilhuber,et al.  The LEHD Infrastructure Files and the Creation of the Quarterly Workforce Indicators , 2009 .

[36]  R. Kohn,et al.  On Gibbs sampling for state space models , 1994 .

[37]  Daniel A. Griffith,et al.  A linear regression solution to the spatial autocorrelation problem , 2000, J. Geogr. Syst..

[38]  Sylvia Richardson,et al.  Markov chain concepts related to sampling algorithms , 1995 .

[39]  Noel A Cressie,et al.  Using temporal variability to improve spatial mapping with application to satellite data , 2010 .

[40]  Noel A Cressie,et al.  Statistics for Spatio-Temporal Data , 2011 .

[41]  Julia Lane,et al.  Supermarket Human Resource Practices and Competition from Mass Merchandisers , 2006 .

[42]  M. Daniels,et al.  Conditionally Specified Space-Time Models for Multivariate Processes , 2006 .

[43]  Peter Congdon A Multivariate Model for Spatio-temporal Health Outcomes with an Application to Suicide Mortality , 2004 .

[44]  Daniel A. Griffith,et al.  Semiparametric Filtering of Spatial Autocorrelation: The Eigenvector Approach , 2007 .

[45]  Noel A Cressie,et al.  Comparing and selecting spatial predictors using local criteria , 2015 .

[46]  Murali Haran,et al.  Dimension reduction and alleviation of confounding for spatial generalized linear mixed models , 2010, 1011.6649.

[47]  G. Oehlert A note on the delta method , 1992 .

[48]  N. Best,et al.  Bayesian latent variable modelling of multivariate spatio-temporal variation in cancer mortality , 2008, Statistical methods in medical research.