Local indicators of geocoding accuracy (LIGA): theory and application

BackgroundAlthough sources of positional error in geographic locations (e.g. geocoding error) used for describing and modeling spatial patterns are widely acknowledged, research on how such error impacts the statistical results has been limited. In this paper we explore techniques for quantifying the perturbability of spatial weights to different specifications of positional error.ResultsWe find that a family of curves describes the relationship between perturbability and positional error, and use these curves to evaluate sensitivity of alternative spatial weight specifications to positional error both globally (when all locations are considered simultaneously) and locally (to identify those locations that would benefit most from increased geocoding accuracy). We evaluate the approach in simulation studies, and demonstrate it using a case-control study of bladder cancer in south-eastern Michigan.ConclusionThree results are significant. First, the shape of the probability distributions of positional error (e.g. circular, elliptical, cross) has little impact on the perturbability of spatial weights, which instead depends on the mean positional error. Second, our methodology allows researchers to evaluate the sensitivity of spatial statistics to positional accuracy for specific geographies. This has substantial practical implications since it makes possible routine sensitivity analysis of spatial statistics to positional error arising in geocoded street addresses, global positioning systems, LIDAR and other geographic data. Third, those locations with high perturbability (most sensitive to positional error) and high leverage (that contribute the most to the spatial weight being considered) will benefit the most from increased positional accuracy. These are rapidly identified using a new visualization tool we call the LIGA scatterplot.Herein lies a paradox for spatial analysis: For a given level of positional error increasing sample density to more accurately follow the underlying population distribution increases perturbability and introduces error into the spatial weights matrix. In some studies positional error may not impact the statistical results, and in others it might invalidate the results. We therefore must understand the relationships between positional accuracy and the perturbability of the spatial weights in order to have confidence in a study's results.

[1]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[2]  Daniel A. Griffith,et al.  PRACTICAL HANDBOOK of Spatial Statistics , 1998 .

[3]  Gerard Rushton,et al.  Modeling the probability distribution of positional errors incurred by residential address geocoding , 2007 .

[4]  Geoffrey M Jacquez,et al.  In search of induction and latency periods: space-time interaction accounting for residential mobility, risk factors and covariates. , 2007, International journal of health geographics.

[5]  Timothy C. Coburn,et al.  Geostatistics for Natural Resources Evaluation , 2000, Technometrics.

[6]  Paul A. Zandbergen,et al.  Positional Accuracy of Spatial Data: Non‐Normal Distributions and a Critique of the National Standard for Spatial Data Accuracy , 2008, Trans. GIS.

[7]  Soumya Mazumdar,et al.  Spatial clustering of the failure to geocode and its implications for the detection of disease clustering. , 2008, Statistics in medicine.

[8]  M. Goodchild,et al.  Uncertainty in geographical information , 2002 .

[9]  Nataliya Kravets,et al.  The accuracy of address coding and the effects of coding errors. , 2007, Health & place.

[10]  Lance A. Waller,et al.  The Effect of Uncertain Locations on Disease Cluster Statistics , 2008 .

[11]  G M Jacquez,et al.  Disease Models Implicit in Statistical Tests of Disease Clustering , 1995, Epidemiology.

[12]  Joanne S Colt,et al.  Positional Accuracy of Two Methods of Geocoding , 2005, Epidemiology.

[13]  J. Platt Strong Inference , 2007 .

[14]  G M Jacquez Disease cluster statistics for imprecise space-time locations. , 1996, Statistics in medicine.

[15]  Russell G. Congalton,et al.  Quantifying Spatial Uncertainty in Natural Resources: Theory and Applications for GIS and Remote Sensing , 2000 .

[16]  R. Dunn,et al.  Positional accuracy and measurement error in digital databases of land use: an empirical study , 1990, Int. J. Geogr. Inf. Sci..

[17]  J. Meliker Reconstructing Individual-Level Exposure to Environmental Contaminants Using Time-GIS , 2008 .

[18]  J. Cuzick,et al.  Spatial clustering for inhomogeneous populations , 1990 .

[19]  Pierre Goovaerts,et al.  Case-control geographic clustering for residential histories accounting for risk factors and covariates , 2015 .

[20]  E. Rosenblueth Point estimates for probability moments. , 1975, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Thomas O Talbot,et al.  Positional error in automated geocoding of residential addresses , 2003, International journal of health geographics.

[22]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[23]  Chris Brunsdon,et al.  Geographically Weighted Regression: The Analysis of Spatially Varying Relationships , 2002 .

[24]  L. Pickle,et al.  Geographic bias related to geocoding in epidemiologic studies , 2005, International journal of health geographics.

[25]  P. Goovaerts,et al.  Individual lifetime exposure to inorganic arsenic using a space–time information system , 2007, International archives of occupational and environmental health.

[26]  Kevin A. Henry,et al.  Estimating the accuracy of geographical imputation , 2008, International journal of health geographics.

[27]  Gerard B. M. Heuvelink,et al.  Error Propagation in Environmental Modelling with GIS , 1998 .

[28]  Dale Zimmerman,et al.  Statistical Methods for Incompletely and Incorrectly Geocoded Cancer Data , 2007 .

[29]  Jing Nie,et al.  Positional Accuracy of Geocoded Addresses in Epidemiologic Research , 2003, Epidemiology.

[30]  Craig A. Knoblock,et al.  An effective and efficient approach for manually improving geocoded data. , 2008, International journal of health geographics.

[31]  Jaymie R Meliker,et al.  Space–time clustering of case–control data with residential histories: insights into empirical induction periods, age-specific susceptibility, and calendar year-specific effects , 2007, Stochastic environmental research and risk assessment : research journal.

[32]  Gerard Rushton,et al.  Geocoding accuracy and the recovery of relationships between environmental exposures and health , 2008, International journal of health geographics.

[33]  Dale L. Zimmerman,et al.  Estimating Spatial Intensity and Variation in Risk from Locations Subject to Geocoding Errors , 2006 .

[34]  J. Platt Strong Inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. , 1964, Science.

[35]  J. Ord,et al.  Local Spatial Autocorrelation Statistics: Distributional Issues and an Application , 2010 .

[36]  Roger Marshall,et al.  A Review of Methods for the Statistical Analysis of Spatial Patterns of Disease , 1991 .

[37]  E G Knox,et al.  The Detection of Space‐Time Interactions , 1964 .

[38]  Dale L. Zimmerman,et al.  Estimating Spatial Intensity and Variation in Risk from Locations Coarsened by Incomplete Geocoding , 2006 .