De-identification of primary care electronic medical records free-text data in Ontario, Canada

BackgroundElectronic medical records (EMRs) represent a potentially rich source of health information for research but the free-text in EMRs often contains identifying information. While de-identification tools have been developed for free-text, none have been developed or tested for the full range of primary care EMR dataMethodsWe used deid open source de-identification software and modified it for an Ontario context for use on primary care EMR data. We developed the modified program on a training set of 1000 free-text records from one group practice and then tested it on two validation sets from a random sample of 700 free-text EMR records from 17 different physicians from 7 different practices in 5 different cities and 500 free-text records from a group practice that was in a different city than the group practice that was used for the training set. We measured the sensitivity/recall, precision, specificity, accuracy and F-measure of the modified tool against manually tagged free-text records to remove patient and physician names, locations, addresses, medical record, health card and telephone numbers.ResultsWe found that the modified training program performed with a sensitivity of 88.3%, specificity of 91.4%, precision of 91.3%, accuracy of 89.9% and F-measure of 0.90. The validations sets had sensitivities of 86.7% and 80.2%, specificities of 91.4% and 87.7%, precisions of 91.1% and 87.4%, accuracies of 89.0% and 83.8% and F-measures of 0.89 and 0.84 for the first and second validation sets respectively.ConclusionThe deid program can be modified to reasonably accurately de-identify free-text primary care EMR records while preserving clinical content.

[1]  UzunerÖzlem,et al.  A de-identifier for medical discharge summaries , 2008 .

[2]  J. Berman Concept-match medical data scrubbing. How pathology text can be used in research. , 2003, Archives of pathology & laboratory medicine.

[3]  W. R. Feasby Canadian medical directory , 1955 .

[4]  Peter Szolovits,et al.  Automated de-identification of free-text medical records , 2008, BMC Medical Informatics Decis. Mak..

[5]  Ulysses J. Balis,et al.  Development and evaluation of an open source software tool for deidentification of pathology reports , 2006, BMC Medical Informatics Decis. Mak..

[6]  Izet Masic Medical Informatics in a United and Healthy Europe , 2009, MIE.

[7]  Róbert Busa-Fekete,et al.  State-of-the-art anonymization of medical records using an iterative machine learning framework. , 2007 .

[8]  Alexander A. Morgan,et al.  Research Paper: Rapidly Retargetable Approaches to De-identification in Medical Records , 2007, J. Am. Medical Informatics Assoc..

[9]  J. Gilbertson,et al.  Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. , 2004, American journal of clinical pathology.

[10]  Peter Szolovits,et al.  A de-identifier for medical discharge summaries , 2008, Artif. Intell. Medicine.

[11]  Clement J. McDonald,et al.  A successful technique for removing names in pathology reports using an augmented search and replace method , 2002, AMIA.

[12]  Peter Szolovits,et al.  Evaluating the state-of-the-art in automatic de-identification. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[13]  Karen Tu,et al.  Using data from electronic medical records: theory versus practice. , 2008, Healthcare quarterly.

[14]  L. Sweeney Replacing personally-identifying information in medical records, the Scrub system. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[15]  Pierre Zweigenbaum,et al.  Testing Tactics to Localize De-Identification , 2009, MIE.

[16]  Özlem Uzuner,et al.  Role of Local Context in Automatic Deidentification of Ungrammatical, Fragmented Text , 2006, NAACL.

[17]  Jules J. Berman Concept-Match Medical Data Scrubbing , 2009 .

[18]  Ricky K. Taira,et al.  Identification of patient name references within medical documents using semantic selectional restrictions , 2002, AMIA.

[19]  Sumithra Velupillai,et al.  Developing a standard for de-identifying electronic patient records written in Swedish: Precision, recall and F-measure in a manual and computerized annotation trial , 2009, Int. J. Medical Informatics.

[20]  Khaled El Emam,et al.  Evaluation of Rare Event Detection , 2010, Canadian Conference on AI.