Accounting for regional background and population size in the detection of spatial clusters and outliers using geostatistical filtering and spatial neutral models: the case of lung cancer in Long Island, New York

BackgroundComplete Spatial Randomness (CSR) is the null hypothesis employed by many statistical tests for spatial pattern, such as local cluster or boundary analysis. CSR is however not a relevant null hypothesis for highly complex and organized systems such as those encountered in the environmental and health sciences in which underlying spatial pattern is present. This paper presents a geostatistical approach to filter the noise caused by spatially varying population size and to generate spatially correlated neutral models that account for regional background obtained by geostatistical smoothing of observed mortality rates. These neutral models were used in conjunction with the local Moran statistics to identify spatial clusters and outliers in the geographical distribution of male and female lung cancer in Nassau, Queens, and Suffolk counties, New York, USA.ResultsWe developed a typology of neutral models that progressively relaxes the assumptions of null hypotheses, allowing for the presence of spatial autocorrelation, non-uniform risk, and incorporation of spatially heterogeneous population sizes. Incorporation of spatial autocorrelation led to fewer significant ZIP codes than found in previous studies, confirming earlier claims that CSR can lead to over-identification of the number of significant spatial clusters or outliers. Accounting for population size through geostatistical filtering increased the size of clusters while removing most of the spatial outliers. Integration of regional background into the neutral models yielded substantially different spatial clusters and outliers, leading to the identification of ZIP codes where SMR values significantly depart from their regional background.ConclusionThe approach presented in this paper enables researchers to assess geographic relationships using appropriate null hypotheses that account for the background variation extant in real-world systems. In particular, this new methodology allows one to identify geographic pattern above and beyond background variation. The implementation of this approach in spatial statistical software will facilitate the detection of spatial disparities in mortality rates, establishing the rationale for targeted cancer control interventions, including consideration of health services needs, and resource allocation for screening and diagnostic testing. It will allow researchers to systematically evaluate how sensitive their results are to assumptions implicit under alternative null hypotheses.

[1]  Julian Besag,et al.  The Detection of Clusters in Rare Diseases , 1991 .

[2]  Clayton V. Deutsch,et al.  GSLIB: Geostatistical Software Library and User's Guide , 1993 .

[3]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[4]  K. Kafadar Choosing among two-dimensional smoothers in practice , 1994 .

[5]  L. Amelin,et al.  Local Indicators of Spatial Association-LISA , 1995 .

[6]  G M Jacquez,et al.  Disease Models Implicit in Statistical Tests of Disease Clustering , 1995, Epidemiology.

[7]  T. Waldhör,et al.  The spatial autocorrelation coefficient Moran's I under heteroscedasticity. , 1996, Statistics in medicine.

[8]  A. Lawson,et al.  Adjusting Moran's I for population density. , 1996, Statistics in medicine.

[9]  Eric J. Gustafson,et al.  Quantifying Landscape Spatial Pattern: What Is the State of the Art? , 1998, Ecosystems.

[10]  L. Pickle,et al.  Application of a weighted head-banging algorithm to mortality data maps. , 1999, Statistics in medicine.

[11]  Eulogio Padro-Igúzquiza VARFIT: a fortran-77 program for fitting variogram models by weighted least squares , 1999 .

[12]  R. Assunção,et al.  A new proposal to adjust Moran's I for population density. , 1999, Statistics in medicine.

[13]  Pierre Goovaerts,et al.  Impact of the simulation algorithm, magnitude of ergodic fluctuations and number of realizations on the spaces of uncertainty of flow properties , 1999 .

[14]  Timothy C. Coburn,et al.  Geostatistics for Natural Resources Evaluation , 2000, Technometrics.

[15]  Robert B Mc Master,et al.  Considerations for Improving Geographic Information System Research in Public Health , 2000 .

[16]  J. Ord,et al.  Testing for Local Spatial Autocorrelation in the Presence of Global Autocorrelation , 2001 .

[17]  Y. MacNab,et al.  Spatio‐temporal modelling of rates for the construction of disease maps , 2002, Statistics in medicine.

[18]  G. Jacquez,et al.  Geographic boundaries in breast, lung and colorectal cancers in relation to exposure to air toxics in Long Island, New York , 2003, International journal of health geographics.

[19]  G. Jacquez,et al.  Local clustering in breast, lung and colorectal cancer in Long Island, New York , 2003, International journal of health geographics.

[20]  Pierre Goovaerts,et al.  Detection of temporal changes in the spatial distribution of cancer rates using local Moran’s I and geostatistically simulated spatial neutral models , 2005, J. Geogr. Syst..

[21]  P. Goovaerts,et al.  Exploring scale-dependent correlations between cancer mortality rates using factorial kriging and population-weighted semivariograms. , 2005, Geographical analysis.