Toward a Fully De-identified Biomedical Information Warehouse

The Information Warehouse at the Ohio State University Medical Center is a comprehensive repository of business, clinical, and research data from various source systems. Data collected here is a valuable resource that facilitates both translational research and personalized healthcare. The use of such data in research is governed by federal privacy regulations with oversight by the Institutional Review Board. In 2006, the Information Warehouse was recognized by the OSU IRB as an "Honest Broker" of clinical data, providing investigators with de-identified or limited datasets under stipulations contained in a signed data use agreement. In order to streamline this process even further, the Information Warehouse is developing a de-identified data warehouse that is suitable for direct user access through a controlled query tool that is aimed to support both research and education activities. In this paper we report our findings on performance evaluation of different de-identification schemes that may be used to ensure regulatory compliance while also facilitating practical database updating and querying. We also discuss how date-shifting in the de-identification process can impact other data elements such as diagnosis and procedure codes and consider a possible solution to those problems.

[1]  Philip R. O. Payne,et al.  Innovative applications of an enterprise-wide information warehouse. , 2008, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[2]  James H Harrison,et al.  The development of health care data warehouses to support data mining. , 2008, Clinics in laboratory medicine.

[3]  Jules J. Berman Concept-Match Medical Data Scrubbing , 2009 .

[4]  Jyoti Kamal,et al.  Honest broker protocol streamlines research access to data while safeguarding patient privacy. , 2008, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[5]  Geraldine P Mineau,et al.  Biomedical databases: protecting privacy and promoting research. , 2003, Trends in biotechnology.

[6]  Kári Stefánsson,et al.  Protection of privacy by third-party encryption in genetic research in Iceland , 2000, European Journal of Human Genetics.

[7]  Li Xiong,et al.  HIDE: An Integrated System for Health Information DE-identification , 2008, 2008 21st IEEE International Symposium on Computer-Based Medical Systems.

[8]  D. Roden,et al.  Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[9]  K. El Emam,et al.  Evaluating Common De-Identification Heuristics for Personal Health Information , 2006, Journal of medical Internet research.

[10]  J. Powell,et al.  Electronic Health Records Should Support Clinical Research , 2005, Journal of medical Internet research.

[11]  G. D. de Moor,et al.  Privacy Enhancing Techniques , 2003, Methods of Information in Medicine.

[12]  Peter Szolovits,et al.  Health information identification and de-identification toolkit , 1998, AMIA.

[13]  Ralph Snyderman,et al.  Prospective medicine: the role for genomics in personalized health planning. , 2004, Pharmacogenomics.

[14]  Bradley Malin,et al.  How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems , 2004, J. Biomed. Informatics.