Optimal Weights for Focused Tests of Clustering Using the Local Moran Statistic

Local spatial statistics measure and test for spatial association for a variable or variables of interest in a geographic neighborhood surrounding a predefined location. Most applications adopt a single scale of analysis but give little attention to the scale of the process generating the data. Alternatively, when the researcher is uncertain about the process scale, local statistics may examine a number of scales. In these cases, it is important to include a correction for multiple testing when evaluating the statistical significance of each local statistic, something that is rarely done. Consequently, local statistics are more likely to identify significant relationships, even when no meaningful spatial association exists. In this article, we develop a methodology for the local Moran statistic that provides both an empirical estimate of the spatial scale of association and an assessment of the significance of the statistic for that scale. The key idea is to test a number of possible choices for the statistic's weight matrix and then account for the multiple testing associated with the number of weight matrices examined. Unlike previous research, our statistic avoids the use of simulation to determine statistical significance in the presence of multiple testing. To test the validity of our approach, we constructed a numerical example to assess the statistic's performance and conducted an empirical study using leukemia data from central New York state. The developed statistic addresses the need for the empirical determination of weights and spatial scale. The test therefore addresses the common weakness of many applications, where weights are defined exogenously, with little or no thought given to either the definition or its implications. Los indicadores locales (local spatial statistics) evaluan la asociacion espacial de una o varias variables de interes dada un area predefinida y sus areas vecinas. La mayoria de dichas medidas utilizan una escala unica de analisis y prestan poca atencion a la escala del proceso de generacion de los datos. En los casos en los que el investigador no esta seguro de la escala del proceso, las los indicadores locales pueden ser evaluados a varias escalas. En dichos casos, cuando se hace la evaluacion de la significancia estadistica de cada indicador local, es importante incorporar una correccion para pruebas multiples (multiple tests), un ajuste que raramente se realiza en la gran mayoria de estudios. Debido al problema de pruebas multiples, los indicadores locales son mas propensos a identificar relaciones significativas, incluso cuando no existe asociacion espacial significativa alguna. En este articulo los autores desarrollan una metodologia que produce un indice local de Moran que proporciona tanto una estimacion empirica de la escala espacial de la asociacion asi como una evaluacion de la importancia del indicador para dicha escala. La idea clave es poner a prueba una serie de opciones posibles para la definicion de la matriz de pesos espaciales (spatial weight matrix) del indice y luego tomar en cuenta las pruebas multiples asociadas con el numero de matrices de peso examinadas. A diferencia de metodos anteriores, el indicador local propuesto evita el uso de simulaciones para determinar la significancia estadistica con pruebas multiples. Para probar la validez del enfoque propuesto, se construyo un ejemplo numerico con el fin de evaluar el desempeno del nuevo indice y se llevo a cabo un estudio comparativo a partir de datos del centro de leucemia del estado de Nueva York. El indice desarrollado responde a la necesidad de definir las ponderaciones (pesos) empiricamente y la escala espacial. De esta forma el metodo propuesto supera limitaciones comunmente halladas de muchas aplicaciones en las cuales los pesos son definidos exogenamente, con poca o ninguna atencion a su definicion o su implicancias. 局部空间统计量可用于度量和检验预定地理区域周围邻域的空间关联。大多数情况下仅采用单一尺度的分析而较少关注数据生成过程的尺度。而当其过程尺度无法确定时,局部统计量却可能检测出多个尺度。在这些案例中,对单个局部统计量统计显著性评估建立多重检验的修正是重要的,而这却鲜有实施。因此,即使存在无意义的空间关联时,局部统计也更可能识别出显著的相关性。 本文发展了一种基于局部Moran统计的方法,提供了空间尺度关联性的经验估计以及对该尺度下统计显著性的评估。其核心思想是测试统计权重矩阵的可能选择,然后考虑与权重矩阵检验数量数目相关的多重检验。与以往研究不同,该方法在多重检验情况中避免了采用模拟来确定统计显著性。为检验其有效性,采用了数值案例来评估其统计性能,并基于纽约州中部的血癌数据进行比较研究。该方法解决了权重和空间尺度确定经验估计的需求,通过验证也相应地解决了很多应用中的普遍弱点,即权重被定义成外生变量,而很少或根本没有考虑其定义或含义。

[1]  B. Boots,et al.  A Programming Approach to Minimizing and Maximizing Spatial Autocorrelation Statistics , 2010 .

[2]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[3]  R. A. Stone Investigations of excess environmental risks around putative sources: statistical problems and a proposed test. , 1988, Statistics in medicine.

[4]  Robin Henderson Change-point problem with correlated observations, with an application in material accountancy , 1986 .

[5]  V. J. Del Rio Vilas,et al.  Within-holding prevalence of sheep classical scrapie in Great Britain , 2009, BMC veterinary research.

[6]  T. C. Haas,et al.  Local Prediction of a Spatio-Temporal Process with an Application to Wet Sulfate Deposition , 1995 .

[7]  Hyune-Ju Kim,et al.  Change-point detection for correlated observations , 1996 .

[8]  A. Lawson On the analysis of mortality events associated with a prespecified fixed point. , 1993, Journal of the Royal Statistical Society. Series A,.

[9]  Julian Besag,et al.  The Detection of Clusters in Rare Diseases , 1991 .

[10]  C. Baumont Spatial effects of urban public policies on housing values , 2009 .

[11]  D. Siegmund,et al.  Tests for a change-point , 1987 .

[12]  A. Getis,et al.  Constructing the Spatial Weights Matrix Using a Local Statistic , 2004 .

[13]  Peter J. Diggle,et al.  A Conditional Approach to Point Process Modelling of Elevated Risk , 1994 .

[14]  J. Keith Ord,et al.  What Were We Thinking , 2009 .

[15]  A. Getis,et al.  Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters , 2006 .

[16]  A. Getis The Analysis of Spatial Association by Use of Distance Statistics , 2010 .

[17]  Atsuyuki Okabe,et al.  Local statistical spatial analysis: Inventory and prospect , 2007, Int. J. Geogr. Inf. Sci..

[18]  S. Kooijman,et al.  Some Remarks on the Statistical Analysis of Grids Especially with Respect to Ecology , 1976 .

[19]  Peter Guttorp,et al.  Advances in Modeling and Inference for Environmental Processes with Nonstationary Spatial Covariance , 2001 .

[20]  Raymond J.G.M. Florax,et al.  The Impacts of Misspecified Spatial Interaction in Linear Regression Models , 1995 .

[21]  Peter A Rogerson Optimal geographic scales for local spatial statistics , 2011, Statistical methods in medical research.

[22]  J. Mandel,et al.  A meta-analysis of occupational trichloroethylene exposure and multiple myeloma or leukaemia. , 2006, Occupational medicine.

[23]  G. Ramstein,et al.  Analysis of the structure of radiometric remotely-sensed images , 1989 .

[24]  Michael A. Wulder,et al.  Automated derivation of geographic window sizes for use in remote sensing digital image texture analysis , 1996 .

[25]  E. Gombay,et al.  On the Rate of Approximations for Maximum Likelihood Tests in Change-Point Models , 1996 .

[26]  Mhamed-Ali El-Aroui,et al.  Visceral leishmaniasis in Tunisia: spatial distribution and association with climatic factors. , 2009, The American journal of tropical medicine and hygiene.

[27]  J. Ord,et al.  Local Spatial Autocorrelation Statistics: Distributional Issues and an Application , 2010 .

[28]  Nicholas R. Seabrook The Obama Effect: Patterns of Geographic Clustering in the 2004 and 2008 Presidential Elections , 2009 .

[29]  W. Verstraeten,et al.  Relating increasing hantavirus incidences to the changing climate: the mast connection , 2009, International journal of health geographics.

[30]  A. Getis Spatial Weights Matrices , 2009 .

[31]  Michael Tiefelsdorf,et al.  The Saddlepoint Approximation of Moran's I's and Local Moran's I i's Reference Distributions and Their Numerical Evaluation , 2002 .