The 'Re-Identification' of Governor William Weld's Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now

The 1997 re-identification of Massachusetts Governor William Weld’s medical data within an insurance data set which had been stripped of direct identifiers has had a profound impact on the development of de-identification provisions within the 2003 Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Weld’s re-identification, purportedly achieved through the use of a voter registration list from Cambridge, MA is frequently cited as an example that computer scientists can re-identify individuals within de-identified data with “astonishing ease”. However, a careful re-examination of the population demographics in Cambridge indicates that Weld was most likely re-identifiable only because he was a public figure who experienced a highly publicized hospitalization rather than there being any certainty underlying his re-identification using the Cambridge voter data, which had missing data for a large proportion of the population.The complete story of Weld's re-identification exposes an important systemic barrier to accurate re-identification known as “the myth of the perfect population register”. Because the logic underlying re-identification depends critically on being able to demonstrate that a person within health data set is the only person in the larger population who has a set of combined characteristics (known as “quasi-identifiers”) that could potentially re-identify them, most re-identification attempts face a strong challenge in being able to create a complete and accurate population register. This strong limitation not only underlies the entire set of famous Cambridge re-identification results but also impacts much of the existing re-identification research cited by those making claims of easy re-identification. This paper critically examines the historic Weld re-identification and the dramatic reductions (thousands fold) of re-identification risks for de-identified health data as they have been protected by the HIPAA Privacy Rule provisions for de-identification since 2003. The paper also provides recommendations for enhancements to existing HIPAA de-identification policy, discusses critical advances routinely made in medical science and improvement of our healthcare system using de-identified data, and provides commentary on the vital importance of properly balancing the competing goals of protecting patient privacy and preserving the accuracy of scientific research and statistical analyses conducted with de-identified data.

[1]  L Sweeney,et al.  Weaving Technology and Policy Together to Maintain Confidentiality , 1997, Journal of Law, Medicine & Ethics.

[2]  Paul Ohm The Myth of the Superuser: Fear, Risk, and Harm Online , 2007 .

[3]  Juan José SALAZAR-GONZÁLEZ,et al.  Statistical Confidentiality: Principles and Practice , 2011 .

[4]  Don E. Detmer,et al.  White Paper: Advancing the Framework: Use of Health Data - A Report of a Working Conference of the American Medical Informatics Association , 2008, J. Am. Medical Informatics Assoc..

[5]  Khaled El Emam,et al.  The Case for De-Identifying Personal Health Information , 2011 .

[6]  Latanya Sweeney,et al.  Guaranteeing anonymity when sharing medical data, the Datafly System , 1997, AMIA.

[7]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[8]  Paul Ohm Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization , 2009 .

[9]  Jane Yakowitz,et al.  Tragedy of the Data Commons , 2011 .

[10]  Deven McGraw,et al.  Building public trust in uses of Health Insurance Portability and Accountability Act de-identified data , 2013, J. Am. Medical Informatics Assoc..

[11]  Mark Elliot,et al.  Scenarios of attack: the data intruder's perspective on statistical disclosure risk , 1999 .

[12]  D. McGraw,et al.  Privacy as an enabler, not an impediment: building trust into health information exchange. , 2009, Health affairs.

[13]  Philippe Golle,et al.  Revisiting the uniqueness of simple demographics in the US population , 2006, WPES '06.

[14]  Bradley Malin,et al.  Evaluating re-identification risks with respect to the HIPAA privacy rule , 2010, J. Am. Medical Informatics Assoc..

[15]  Douglas Peddicord,et al.  A proposal to protect privacy of health information while accelerating comparative effectiveness research. , 2010, Health affairs.