Geospatial cryptography: enabling researchers to access private, spatially referenced, human subjects data for cancer control and prevention

Abstract As the volume, accuracy and precision of digital geographic information have increased, concerns regarding individual privacy and confidentiality have come to the forefront. Not only do these challenge a basic tenet underlying the advancement of science by posing substantial obstacles to the sharing of data to validate research results, but they are obstacles to conducting certain research projects in the first place. Geospatial cryptography involves the specification, design, implementation and application of cryptographic techniques to address privacy, confidentiality and security concerns for geographically referenced data. This article defines geospatial cryptography and demonstrates its application in cancer control and surveillance. Four use cases are considered: (1) national‐level de‐duplication among state or province‐based cancer registries; (2) sharing of confidential data across cancer registries to support case aggregation across administrative geographies; (3) secure data linkage; and (4) cancer cluster investigation and surveillance. A secure multi-party system for geospatial cryptography is developed. Solutions under geospatial cryptography are presented and computation time is calculated. As services provided by cancer registries to the research community, de-duplication, case aggregation across administrative geographies and secure data linkage are often time-consuming and in some instances precluded by confidentiality and security concerns. Geospatial cryptography provides secure solutions that hold significant promise for addressing these concerns and for accelerating the pace of research with human subjects data residing in our nation’s cancer registries. Pursuit of the research directions posed herein conceivably would lead to a geospatially encrypted geographic information system (GEGIS) designed specifically to promote the sharing and spatial analysis of confidential data. Geospatial cryptography holds substantial promise for accelerating the pace of research with spatially referenced human subjects data.

[1]  M. Boulos,et al.  Musings on privacy issues in health research involving disaggregate geographic data about individuals , 2009, International journal of health geographics.

[2]  L. Waller,et al.  Applied Spatial Statistics for Public Health Data: Waller/Applied Spatial Statistics , 2004 .

[3]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[4]  Sushil Jajodia,et al.  Combining fragmentation and encryption to protect privacy in data storage , 2010, TSEC.

[5]  Pierre Goovaerts,et al.  Breast and prostate cancer survival in Michigan , 2009, Cancer.

[6]  Murat Kantarcioglu,et al.  A Protocol for the Secure Linking of Registries for HPV Surveillance , 2012, PloS one.

[7]  Louis J. M. Aslett,et al.  A review of homomorphic encryption and software tools for encrypted statistical machine learning , 2015, ArXiv.

[8]  Roger Marshall,et al.  A Review of Methods for the Statistical Analysis of Spatial Patterns of Disease , 1991 .

[9]  A. Diez-Roux,et al.  Bringing context back into epidemiology: variables and fallacies in multilevel analysis. , 1998, American journal of public health.

[10]  Vassilios S. Verykios,et al.  Privacy preserving record linkage approaches , 2009, Int. J. Data Min. Model. Manag..

[11]  W. Thompson,et al.  Privacy versus public health: the impact of current confidentiality rules. , 2010, American journal of public health.

[12]  E G Knox,et al.  The Detection of Space‐Time Interactions , 1964 .

[13]  Sergio Teggi,et al.  Risk of congenital anomalies around a municipal solid waste incinerator: a GIS-based case-control study , 2009, International journal of health geographics.

[14]  Jane L Garb,et al.  Using GIS for spatial analysis of rectal lesions in the human body , 2007, International journal of health geographics.

[15]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[16]  Craig Gentry,et al.  Fully homomorphic encryption using ideal lattices , 2009, STOC '09.

[17]  Michael Leitner,et al.  Keeping an eye on privacy issues with geospatial data , 2006, Nature.

[18]  Palash Sarkar,et al.  A Simple and Generic Construction of Authenticated Encryption with Associated Data , 2010, TSEC.

[19]  Alexander Bohnert,et al.  A Cluster Analysis of Pediatric Cancer Incidence Rates in Florida: 2000–2010 , 2014 .

[20]  John M. Abowd,et al.  New Approaches to Confidentiality Protection: Synthetic Data, Remote Access and Research Data Centers , 2004, Privacy in Statistical Databases.

[21]  Leah K VanWey,et al.  Confidentiality and spatially explicit data: Concerns and challenges , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Andrew B. Lawson,et al.  Statistical Methods in Spatial Epidemiology: Lawson/Statistical Methods in Spatial Epidemiology , 2006 .

[23]  Gerard Rushton,et al.  Geocoding in cancer research: a review. , 2006, American journal of preventive medicine.

[24]  L. Waller,et al.  Applied Spatial Statistics for Public Health Data , 2004 .

[25]  Elaine B. Barker,et al.  SP 800-131A. Transitions: Recommendation for Transitioning the Use of Cryptographic Algorithms and Key Lengths , 2011 .

[26]  Jongsung Kim,et al.  Advanced computer mathematics based cryptography and security technologies , 2013, Int. J. Comput. Math..

[27]  Shannon C. Wieland,et al.  Revealing the spatial distribution of a disease while preserving privacy , 2008, Proceedings of the National Academy of Sciences.

[28]  Nina H Fefferman,et al.  Confidentiality and Confidence: Is Data Aggregation a Means to Achieve Both? , 2005, Journal of public health policy.

[29]  M. Gutmann,et al.  Providing Spatial Data for Secondary Analysis: Issues and Current Practices Relating to Confidentiality , 2008, Population research and policy review.

[30]  Fábio Dacêncio Pereira,et al.  Implementation of the fully homomorphic encryption scheme over integers with shorter keys , 2015, 2015 7th International Conference on New Technologies, Mobility and Security (NTMS).

[31]  A. Curtis,et al.  Spatial confidentiality and GIS: re-engineering mortality locations from published maps about Hurricane Katrina , 2006, International journal of health geographics.

[32]  Andrew B. Lawson,et al.  Statistical Methods in Spatial Epidemiology , 2001 .

[33]  M. Kulldorff A spatial scan statistic , 1997 .

[34]  Michael F Goodchild,et al.  Spatial Turn in Health Research , 2013, Science.

[35]  Jun Hu,et al.  A secure protocol for protecting the identity of providers when disclosing data for disease surveillance , 2011, J. Am. Medical Informatics Assoc..

[36]  Frederik Vercauteren,et al.  Fully Homomorphic Encryption with Relatively Small Key and Ciphertext Sizes , 2010, Public Key Cryptography.

[37]  Geoffrey M Jacquez,et al.  Current practices in the spatial analysis of cancer: flies in the ointment , 2004, International journal of health geographics.

[38]  Marc Mouffron,et al.  Transitive q-Ary Functions over Finite Fields or Finite Sets: Counts, Properties and Applications , 2008, WAIFI.

[39]  Daniel Wartenberg,et al.  Current practices in spatial analysis of cancer data: mapping health statistics to inform policymakers and the public , 2006, International journal of health geographics.

[40]  P. Zandbergen Ensuring Confidentiality of Geocoded Health Data: Assessing Geographic Masking Strategies for Individual-Level Data , 2014, Advances in medicine.

[41]  Mei-Po Kwan,et al.  Replication of scientific research: addressing geoprivacy, confidentiality, and data sharing challenges in geospatial research , 2015, Ann. GIS.

[42]  Caroline Fontaine,et al.  A Survey of Homomorphic Encryption for Nonspecialists , 2007, EURASIP J. Inf. Secur..

[43]  Murat Kantarcioglu,et al.  A Cryptographic Approach to Securely Share and Query Genomic Sequences , 2008, IEEE Transactions on Information Technology in Biomedicine.

[44]  Geoffrey M Jacquez,et al.  In search of induction and latency periods: space-time interaction accounting for residential mobility, risk factors and covariates. , 2007, International journal of health geographics.

[45]  Craig Gentry,et al.  Implementing Gentry's Fully-Homomorphic Encryption Scheme , 2011, EUROCRYPT.

[46]  J. Cuzick,et al.  Spatial clustering for inhomogeneous populations , 1990 .

[47]  Andrew Curtis,et al.  Confidentiality risks in fine scale aggregations of health data , 2011, Comput. Environ. Urban Syst..

[48]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[49]  Chris Brunsdon,et al.  Geographically Weighted Regression: The Analysis of Spatially Varying Relationships , 2002 .