Spatial Lasso With Applications to GIS Model Selection

Geographic information systems (GIS) organize spatial data in multiple two-dimensional arrays called layers. In many applications, a response of interest is observed on a set of sites in the landscape, and it is of interest to build a regression model from the GIS layers to predict the response at unsampled sites. Model selection in this context then consists not only of selecting appropriate layers, but also of choosing appropriate neighborhoods within those layers. We formalize this problem as a linear model and propose the use of Lasso to simultaneously select variables, choose neighborhoods, and estimate parameters. Spatially dependent errors are accounted for using generalized least squares and spatial smoothness in selected coefficients is incorporated through use of a priori spatial covariance structure. This leads to a modification of the Lasso procedure, called spatial Lasso. The spatial Lasso can be implemented by a fast algorithm and it performs well in numerical examples, including an application to prediction of soil moisture. The methodology is also extended to generalized linear models. Supplemental materials including R computer code and data analyzed in this article are available online.

[1]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[4]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[5]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[6]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[7]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[8]  M. E. Dale,et al.  A Gis-derived integrated moisture index to predict forest composition and productivity of Ohio forests (U.S.A.) , 1997, Landscape Ecology.

[9]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[10]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[11]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[12]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[13]  Berwin A. Turlach,et al.  On algorithms for solving least squares problems under an L1 penalty or an L1 constraint , 2005 .

[14]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[15]  Estimating equations for spatially correlated data in multi-dimensional space , 2008 .

[16]  Beryl Rawson,et al.  Degrees of Freedom , 2010 .

[17]  Bernard Fingleton,et al.  Analyzing Cross‐classified Data with Inherent Spatial Dependence , 2010 .

[18]  Shifeng Xiong,et al.  Better subset regression , 2012, 1212.0634.