Eigenvector Spatial Filtering for Large Data Sets: Fixed and Random Effects Approaches

Eigenvector spatial filtering (ESF) is a spatial modeling approach, which has been applied in urban and regional studies, ecological studies, and so on. However, it is computationally demanding, and may not be suitable for large data modeling. The objective of this study is developing fast ESF and random effects ESF (RE-ESF), which are capable of handling very large samples. To achieve it, we accelerate eigen-decomposition and parameter estimation, which make ESF and RE-ESF slow. The former is accelerated by utilizing the Nystrom extension, whereas the latter is by small matrix tricks. The resulting fast ESF and fast RE-ESF are compared with non-approximated ESF and RE-ESF in Monte Carlo simulation experiments. The result shows that, while ESF and RE-ESF are slow for several thousand samples, fast ESF and RE-ESF require only several seconds for the samples. They also suggest that the proposed approaches effectively remove positive spatial dependence in the residuals with very small approximation errors when the number of eigenvectors considered is 200 or more. Note that these approaches cannot deal with negative spatial dependence. The proposed approaches are implemented in an R package "spmoran."

[1]  Hajime Seya,et al.  Application of Lasso to the Eigenvector Selection Problem in Eigenvector Based Spatial Filtering , 2013 .

[2]  Noel A Cressie,et al.  The SAR Model for Very Large Datasets: A Reduced Rank Approach , 2015 .

[3]  Ying Sun,et al.  Geostatistics for Large Datasets , 2012 .

[4]  Harry H. Kelejian,et al.  A Generalized Moments Estimator for the Autoregressive Parameter in a Spatial Model , 1999 .

[5]  N. Cressie,et al.  Fixed rank kriging for very large spatial data sets , 2008 .

[6]  Luc Anselin,et al.  Properties of Tests for Spatial Dependence in Linear Regression Models , 2010 .

[7]  Pierre Legendre,et al.  All-scale spatial analysis of ecological data by means of principal coordinates of neighbour matrices , 2002 .

[8]  Christopher J Paciorek,et al.  The importance of scale for spatial-confounding bias and precision of spatial regression estimators. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[9]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[10]  D. Griffith Eigenfunction properties and approximations of selected incidence matrices employed in spatial analyses , 2000 .

[11]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[12]  James T. Kwok,et al.  Clustered Nyström Method for Large Scale Manifold Learning and Dimension Reduction , 2010, IEEE Transactions on Neural Networks.

[13]  Douglas M. Bates,et al.  Linear mixed model implementation in lme4 , 2013 .

[14]  Daniel A. Griffith,et al.  Eigenvector selection with stepwise regression techniques to construct eigenvector spatial filters , 2016, J. Geogr. Syst..

[15]  J. Paul Elhorst,et al.  Competition in Research Activity among Economic Departments: Evidence by Negative Spatial Autocorrelation , 2014 .

[16]  Daniel A. Griffith,et al.  Semiparametric Filtering of Spatial Autocorrelation: The Eigenvector Approach , 2007 .

[17]  Jonathan B. Thayn,et al.  Accounting for Spatial Autocorrelation in Linear Regression Models Using Spatial Filtering with Eigenvectors , 2013 .

[18]  Luc Anselin,et al.  Thirty years of spatial econometrics , 2010 .

[19]  Daniel A Griffith,et al.  Spatial modeling in ecology: the flexibility of eigenfunction spatial analyses. , 2006, Ecology.

[20]  Akira Shimada,et al.  Land price maps of Tokyo Metropolitan Area , 2011 .

[21]  J. Hodges,et al.  Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love , 2010 .

[22]  H. Kelejian,et al.  Specification and Estimation of Spatial Autoregressive Models with Autoregressive and Heteroskedastic Disturbances , 2008, Journal of econometrics.

[23]  H. Kelejian,et al.  A Generalized Spatial Two-Stage Least Squares Procedure for Estimating a Spatial Autoregressive Model with Autoregressive Disturbances , 1998 .

[24]  James P. LeSage,et al.  A matrix exponential spatial specification , 2007 .

[25]  A. Gelfand,et al.  Gaussian predictive process models for large spatial data sets , 2008, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[26]  Daniel A. Griffith,et al.  Approximation of Gaussian spatial autoregressive models for massive regular square tessellation data , 2015, Int. J. Geogr. Inf. Sci..

[27]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[28]  Mevin B. Hooten,et al.  Spatial occupancy models for large data sets , 2013 .

[29]  Daniel A. Griffith,et al.  Random effects specifications in eigenvector spatial filtering: a simulation study , 2015, J. Geogr. Syst..

[30]  Sudipto Banerjee,et al.  Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets , 2014, Journal of the American Statistical Association.

[31]  Daniel A. Griffith,et al.  A Moran coefficient-based mixed effects approach to investigate spatially varying relationships , 2016 .

[32]  Daniel A. Griffith,et al.  Detecting negative spatial autocorrelation in georeferenced random variables , 2010, Int. J. Geogr. Inf. Sci..

[33]  Stéphane Dray,et al.  Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM) , 2006 .

[34]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[35]  G. Arbia pairwise likelihood inference for spatial regressions estimated on very large datasets , 2014 .

[36]  Daniel A. Griffith,et al.  Hidden negative spatial autocorrelation , 2006, J. Geogr. Syst..

[37]  H. Rue,et al.  An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach , 2011 .

[38]  D. Griffith Spatial Autocorrelation and Spatial Filtering: Gaining Understanding Through Theory and Scientific Visualization , 2010 .

[39]  G. Arbia,et al.  Dirty spatial econometrics , 2016 .

[40]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[41]  Daniel A. Griffith,et al.  A spatial filtering specification for the auto-Poisson model , 2002 .

[42]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[43]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[44]  Murali Haran,et al.  Dimension reduction and alleviation of confounding for spatial generalized linear mixed models , 2010, 1011.6649.

[45]  Zhiyi Chi,et al.  Approximating likelihoods for large spatial data sets , 2004 .

[46]  J. LeSage Introduction to spatial econometrics , 2009 .

[47]  Daniel A. Griffith,et al.  Faster maximum likelihood estimation of very large spatial autoregressive models: an extension of the Smirnov–Anselin result , 2004 .

[48]  D. Griffith Spatial Autocorrelation and Spatial Filtering , 2003 .

[49]  Yoshiki Yamagata,et al.  Application of Lasso to the Eigenvector Selection Problem in Eigenvector Based Spatial Filtering , 2013 .

[50]  James P. LeSage,et al.  Interpretation and Computation of Estimates from Regression Models using Spatial Filtering , 2013 .

[51]  R. G. Davies,et al.  Methods to account for spatial autocorrelation in the analysis of species distributional data : a review , 2007 .

[52]  Daniel A. Griffith,et al.  A Spatial Filtering Specification for the Autologistic Model , 2004 .