bench4gis: Benchmarking Privacy-aware Geocoding with Open Big Data

Geocoding, the process of translating addresses to geographic coordinates, is a relatively straight-forward and well-studied process, but limitations due to privacy concerns may restrict usage of geographic data. The impact of these limitations are further compounded by the scale of the data, and in turn, also limits viable geocoding strategies. For example, healthcare data is protected by patient privacy laws in addition to possible institutional regulations that restrict external transmission and sharing of data. This results in the implementation of “in-house” geocoding solutions where data is processed behind an organization’s firewall; quality assurance for these implementations is problematic because sensitive data cannot be used to externally validate results. In this paper, we present our software framework called bench4gis which benchmarks privacy-aware geocoding solutions by leveraging open big data as surrogate data for quality assurance; the scale of open big data sets for address data can ensure that results are geographically meaningful for the locale of the implementing institution.

[1]  William R. Buckingham,et al.  The potential and pitfalls of geocoding electronic health records. , 2012, WMJ : official publication of the State Medical Society of Wisconsin.

[2]  Benjamin M. Taylor,et al.  Mapping English GP prescribing data: a tool for monitoring health-service inequalities , 2013, BMJ Open.

[3]  Peter Christen,et al.  Privacy-Preserving Data Linkage and Geocoding: Current Approaches and Research Directions , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[4]  Hassan A. Karimi,et al.  Comparative evaluation and analysis of online geocoding services , 2010, Int. J. Geogr. Inf. Sci..

[5]  Patrick B. Ryan,et al.  Decentralized and reproducible geocoding and characterization of community and environmental exposures for multisite studies , 2017, J. Am. Medical Informatics Assoc..

[6]  Peter Christen,et al.  Geocode Matching and Privacy Preservation , 2009, PinKDD.

[7]  Thomas A. Horan,et al.  Protecting patient geo-privacy via a triangular displacement geo-masking method , 2014, GeoPrivacy '14.

[8]  Cole Brokamp DeGAUSS: Decentralized Geomarker Assessment for Multi-Site Studies , 2018, J. Open Source Softw..

[9]  P. Zandbergen Ensuring Confidentiality of Geocoded Health Data: Assessing Geographic Masking Strategies for Individual-Level Data , 2014, Advances in medicine.

[10]  Brian Rivera,et al.  Technical Strategies for Real-time Geocoding in Healthcare , 2018, 2018 IEEE International Smart Cities Conference (ISC2).

[11]  Fusheng Wang,et al.  EaserGeocoder: integrative geocoding with machine learning (demo paper) , 2018, SIGSPATIAL/GIS.

[12]  James C. McClay,et al.  Incorporating a location-based socioeconomic index into a de-identified i2b2 clinical data warehouse , 2019, J. Am. Medical Informatics Assoc..

[13]  Andrew Curtis,et al.  Confidentiality risks in fine scale aggregations of health data , 2011, Comput. Environ. Urban Syst..

[14]  J. Boddy,et al.  What does it feel like to live here? Exploring sensory ethnography as a collaborative methodology for investigating social determinants of health in place. , 2012, Health & place.

[15]  C. Delcher,et al.  Monitoring health inequities and planning in Virginia: poverty, human immunodeficiency virus, and sexually transmitted infections. , 2008, Sexually transmitted diseases.