A Geostatistical Linear Regression Model for Small Area Data. 一种适用于小区域数据的地统计线性回归模型

We present a new linear regression model for use with aggregated, small area data that are spatially autocorrelated. Because these data are aggregates of individual-level data, we choose to model the spatial autocorrelation using a geostatistical model specified at the scale of the individual. The autocovariance of observed small area data is determined via the natural aggregation over the population. Unlike lattice-based autoregressive approaches, the geostatistical approach is invariant to the scale of data aggregation. We establish that this geostatistical approach also is a valid autoregressive model; thus, we call this approach the geostatistical autoregressive (GAR) model. An asymptotically consistent and efficient maximum likelihood estimator is derived for the GAR model. Finite sample evidence from simulation experiments demonstrates the relative efficiency properties of the GAR model. Furthermore, while aggregation results in less efficient estimates than disaggregated data, the GAR model provides the most efficient estimates from the data that are available. These results suggest that the GAR model should be considered as part of a spatial analyst's toolbox when aggregated, small area data are analyzed. More important, we believe that the GAR model's attention to the individual-level scale allows for a more flexible and theory-informed specification than the existing autoregressive approaches based on an area-level spatial weights matrix. Because many spatial process models, both in geography and in other disciplines, are specified at the individual level, we hope that the GAR covariance specification will provide a vehicle for a better informed and more interdisciplinary use of spatial regression models with area-aggregated data. Este articulo presenta un nuevo modelo de regresion lineal para datos agregados en area pequenas que poseen autocorrelacion espacial. Dado que los datos utilizados son agregaciones de caracteristicas a nivel individual, el estudio conceptualiza la autocorrelacion espacial mediante un modelo geoestadistico especificado a escala individual. La autocovarianza de datos en areas pequenas se determina a traves de la agregacion natural (natural aggregation) en referencia a la poblacion. A diferencia de los enfoques autorregresivos basados en estructuras espaciales reticulares (lattice), el enfoque geoestadistico no se ve afectado por el cambio de escala ocasionado por la agregacion de datos. Los autores sostienen que el enfoque geoestadistico es equivalente a un modelo autorregresivo, razon por la cual los autores lo denominan modelo autorregresivo geoestadistico (geostatistical autoregressive model-GAR). A partir de GAR se deriva un estimador de maxima verosimilitud (maximum likelihood) asintoticamente coherente y estadisticamente eficiente. Las propiedades de eficiencia estadistica de GAR son demostradas mediante experimentos de simulacion en pruebas de muestra finita. Si bien es cierto que los resultados de metodos analiticos que usan datos agregados arrojan estimaciones menos eficientes que tecnicas que utilizan datos desagregados, el modelo GAR proporciona las estimaciones mas eficientes dentro de los metodos agregados. Los resultados del estudio sugieren que el modelo GAR debe ser considerado como parte del conjunto de herramientas de analisis espacial, en particular para el caso de agregados para areas pequenas. Los autores finalizan sosteniendo que el modelo GAR, por su atencion especial a la escala individual, permite una especificacion mas flexible y teorica que los enfoques autorregresivos existentes que se basan en una matriz de pesos espaciales (spatial weitght matrix) a partir de areas. Debido a que muchos modelos de procesos espaciales, tanto en geografia como en otras disciplinas, se especifican a un individual, los autores esperan que la especificacion de covarianza del GAR sirva inspire un uso mejor informado y mas interdisciplinario de modelos de regresion espacial con datos agregados para areas pequenas. 本文提出一种新的线性回归模型,可用于呈空间自相关的集聚小区域数据的分析。由于数据在个体层次上存在集聚,我们采用针对个体尺度的地统计模型对其空间自相关进行建模;所观测的小区域数据的自协方差取决于母体的自然集聚特性。不同于基于网格的自回归方法,地统计法相对于数据集聚的尺度大小具有不变性。本文论证了该地统计模型也是一种有效的自回归模型,我们称之为地统计自回归(GAR)模型。 我们从GAR模型中导出了一个渐进一致的,有效的最大似然估计量,基于有限样本进行的模拟实验验证了GAR模型的相对有效性。此外,当集聚数据较非集聚数据估计效率更低时,GAR模型提供了基于可用数据的最有效的估计。上述结果显示在分析集聚的小区域数据时,GAR模型可作为空间分析工具包的组成部分。 更重要的是,我们认为基于个体尺度的GAR模型相对于现有基于区域尺度的空间权重矩阵的自回归模型更为灵活,也更具理论依据。由于地理学和其他学科中的多数空间过程模型均关注个体尺度,GAR协方差的描述有望提供一个更具理论基础,也更适用于跨学科应用的面聚集数据空间回归模型的分析工具。

[1]  P. Kyriakidis A Geostatistical Framework for Area-to-Point Spatial Interpolation , 2004 .

[2]  M. Fuentes Approximate Likelihood for Large Irregularly Spaced Spatial Data , 2007, Journal of the American Statistical Association.

[3]  J. Mennis Generating Surface Models of Population Using Dasymetric Mapping , 2003, The Professional Geographer.

[4]  Montserrat Fuentes,et al.  A comparative study of Gaussian geostatistical models and Gaussian Markov random field models1. , 2008, Journal of multivariate analysis.

[5]  Pierre Goovaerts,et al.  Kriging and Semivariogram Deconvolution in the Presence of Irregular Geographical Units , 2008, Mathematical geology.

[6]  Daniel A. Griffith,et al.  Exploring relationships between semi-variogram and spatial autoregressive models , 1993 .

[7]  Audris Mockus,et al.  Estimating Dependencies From Spatial Averages , 1998 .

[8]  Eulogio Pardo-Igúzquiza,et al.  Maximum Likelihood Estimation of Spatial Covariance Parameters , 1998 .

[9]  Julian Besag,et al.  On a System of Two-dimensional Recurrence Equations , 1981 .

[10]  Ezio Todini,et al.  Influence of Parameter Estimation Uncertainty in Kriging: Part 1– Theoretical Development Influence of Parameter Estimation Uncertainty in Kriging: Part 1 – Theoretical Development , 2022 .

[11]  Noel A. C. Cressie,et al.  Statistics for Spatial Data: Cressie/Statistics , 1993 .

[12]  Ezio Todini,et al.  Influence of parameter estimation uncertainty in Kriging , 1996 .

[13]  R. Carroll,et al.  A Note on the Efficiency of Sandwich Covariance Matrix Estimation , 2001 .

[14]  Noel Cressie,et al.  Conditional-mean least-squares fitting of Gaussian Markov random fields to Gaussian fields , 2008, Comput. Stat. Data Anal..

[15]  David G Steel,et al.  Analysing and Adjusting Aggregation Effects: The Ecological Fallacy Revisited , 1996 .

[16]  Harry H. Kelejian,et al.  HAC estimation in a spatial framework , 2007 .

[17]  Michael J. Donahoo,et al.  Under the Hood , 2009 .

[18]  J. Magnus Maximum likelihood estimation of the GLS model with unknown parameters in the disturbance covariance matrix , 1978 .

[19]  Daniel A. Griffith,et al.  On the quality of likelihood-based estimators in spatial autoregressive models when the data dependence structure is misspecified , 1998 .

[20]  David G Steel,et al.  Making unit-level inferences from aggregate data , 1996 .

[21]  Nicholas N. Nagle,et al.  Spatial Linear Regression from Census Microdata: Combining Microdata and Small Area Data , 2009 .

[22]  Raymond J.G.M. Florax,et al.  The Impacts of Misspecified Spatial Interaction in Linear Regression Models , 1995 .

[23]  J. Davidson Stochastic Limit Theory , 1994 .

[24]  Daniel A. Griffith,et al.  Efficiency of least squares estimators in the presence of spatial autocorrelation , 1993 .

[25]  P. Whittle ON STATIONARY PROCESSES IN THE PLANE , 1954 .

[26]  Noel A Cressie,et al.  Change of support and the modifiable areal unit problem , 1996 .

[27]  I. Bracken,et al.  The Generation of Spatial Population Distributions from Census Centroid Data , 1989, Environment & planning A.

[28]  David G Steel,et al.  Aggregation and Ecological Effects in Geographically Based Data , 2010 .

[29]  C. Gotway,et al.  Combining Incompatible Spatial Data , 2002 .