The re-identification risk of Canadians from longitudinal demographics

BackgroundThe public is less willing to allow their personal health information to be disclosed for research purposes if they do not trust researchers and how researchers manage their data. However, the public is more comfortable with their data being used for research if the risk of re-identification is low. There are few studies on the risk of re-identification of Canadians from their basic demographics, and no studies on their risk from their longitudinal data. Our objective was to estimate the risk of re-identification from the basic cross-sectional and longitudinal demographics of Canadians.MethodsUniqueness is a common measure of re-identification risk. Demographic data on a 25% random sample of the population of Montreal were analyzed to estimate population uniqueness on postal code, date of birth, and gender as well as their generalizations, for periods ranging from 1 year to 11 years.ResultsAlmost 98% of the population was unique on full postal code, date of birth and gender: these three variables are effectively a unique identifier for Montrealers. Uniqueness increased for longitudinal data. Considerable generalization was required to reach acceptably low uniqueness levels, especially for longitudinal data. Detailed guidelines and disclosure policies on how to ensure that the re-identification risk is low are provided.ConclusionsA large percentage of Montreal residents are unique on basic demographics. For non-longitudinal data sets, the three character postal code, gender, and month/year of birth represent sufficiently low re-identification risk. Data custodians need to generalize their demographic information further for longitudinal data sets.

[1]  Philippe Golle,et al.  Revisiting the uniqueness of simple demographics in the US population , 2006, WPES '06.

[2]  Richard J. Whiddett,et al.  Patients' attitudes towards sharing their health information , 2006, Int. J. Medical Informatics.

[3]  C.T.A.M. de Laat,et al.  A study on the re-identifiability of Dutch citizens , 2010 .

[4]  T. Hedrick Justifications for the sharing of social science data , 1988 .

[5]  L. Zayatz,et al.  BUREAU OF THE CENSUS STATISTICAL RESEARCH DIVISION REPORT SERIES SRD Research Report Number : CENSUS / SRD / RR-91 / 08 ESTIMATION OF THE PERCENT OF UNIQUE POPULATION ELEMENTS ON A MICRODATA FILE USING THE SAMPLE , 1998 .

[6]  Y. Khaliq,et al.  Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans , 2012 .

[7]  Katharine G. Abraham Microdata Access and Labor Market Research: The U. S. Experience , 2005 .

[8]  D G Altman,et al.  Authors should make their data available , 2001, BMJ : British Medical Journal.

[9]  W. Keller,et al.  Disclosure control of microdata , 1990 .

[10]  D J Hutchon,et al.  Publishing raw data and real time statistical analysis on e-journals , 2001, BMJ : British Medical Journal.

[11]  L. Sweeney,et al.  Trail Re-Identification: Learning Who You Are From Where You Have Been , 2003 .

[12]  Geoffrey C. Bowker,et al.  Promoting Access to Public Research Data for Scientific, Economic, and Social Development , 2004, Data Sci. J..

[13]  Elizabeth Martin,et al.  SERIES ( Survey Methodology # 2006-10 ) Privacy Concerns and the Census Long Form : Some Evidence from Census 2000 , 2001 .

[14]  B. J. Yolles,et al.  Obtaining access to data from government-sponsored medical research. , 1986, The New England journal of medicine.

[15]  Lehana Thabane,et al.  Alternatives to project-specific consent for access to personal information for health research: what is the opinion of the Canadian public? , 2007, Journal of the American Medical Informatics Association : JAMIA.

[16]  C. Hogue,et al.  Ethical issues in sharing epidemiologic data. , 1991, Journal of clinical epidemiology.

[17]  Lowrance Wm Access to Collections of Data and Material for Health Research. A report to the Medical Research Council and the Wellcome Trust , 2006 .

[18]  K. Brazil,et al.  Access to medical records for research purposes: varying perceptions across research ethics boards , 2008, Journal of Medical Ethics.

[19]  Ann Cavoukian Privacy concerns in preventing fraudulent publication , 2006, Canadian Medical Association Journal.

[20]  Rodney A Hayward,et al.  Patients, privacy and trust: patients' willingness to allow researchers to access their medical records. , 2007, Social science & medicine.

[21]  H. Howe,et al.  Method to assess identifiability in electronic data files. , 2006, American journal of epidemiology.

[22]  C. Skinner,et al.  A measure of disclosure risk for microdata , 2002 .

[23]  Karim Keshavjee,et al.  Patients' consent preferences regarding the use of their health information for research purposes: a qualitative study , 2004, Journal of health services research & policy.

[24]  K. Patel,et al.  Whose data are they anyway? , 2012, BMJ : British Medical Journal.

[25]  J P Vandenbroucke,et al.  Increasing the accessibility of data , 1994, BMJ.

[26]  John Van Hoewyk,et al.  Attitudes and Behavior: The Impact of Privacy and Confidentiality Concerns on Participation in the 2000 Census , 2003 .

[27]  Philippe Golle,et al.  On the Anonymity of Home/Work Location Pairs , 2009, Pervasive.

[28]  R. Rockwell Privacy and Confidentiality as Factors in Survey Response. , 1981 .

[29]  Bradley Malin,et al.  Re-identification of DNA through an automated linkage process , 2001, AMIA.

[30]  D. Altman,et al.  Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers , 2010, BMJ : British Medical Journal.

[31]  Khaled El Emam,et al.  Model Formulation: Evaluating Predictors of Geographic Area Population Size Cut-offs to Manage Re-identification Risk , 2009, J. Am. Medical Informatics Assoc..

[32]  Sara Chandros Hull,et al.  The Use of Medical Records in Research: What Do Patients Want? , 2003, Journal of Law, Medicine & Ethics.

[33]  L. Cosler,et al.  Conforming to HIPAA regulations and compilation of research data. , 2004, American journal of health-system pharmacy : AJHP : official journal of the American Society of Health-System Pharmacists.

[34]  A. Vickers Whose data set is it anyway? Sharing raw data from randomized trials , 2006, Trials.

[35]  Khaled El Emam,et al.  A method for managing re-identification risk from small geographic areas in Canada , 2010, BMC Medical Informatics Decis. Mak..

[36]  Bradley Malin,et al.  Evaluating re-identification risks with respect to the HIPAA privacy rule , 2010, J. Am. Medical Informatics Assoc..

[37]  William W Lowrance,et al.  Learning from experience: privacy and the secondary use of data in health research , 2002, Journal of health services research & policy.

[38]  C. Mackie,et al.  Improving access to and confidentiality of research data , 2000 .

[39]  Kris Christen,et al.  Let the sunshine in. , 2003, Environmental science & technology.

[40]  Khaled El Emam,et al.  Heuristics for De-identifying Health Data , 2008, IEEE Secur. Priv..

[41]  Laura A. Levit,et al.  Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health Through Research. Washington, DC: National Academies Press , 2009 .

[42]  B. Woodward,et al.  Disclosure and use of personal health information , 1996, BMJ.

[43]  John Krumm,et al.  Inference Attacks on Location Tracks , 2007, Pervasive.

[44]  Are journals doing enough to prevent fraudulent publication? , 2006, Canadian Medical Association Journal.