Attribute Utility Motivated k-anonymization of datasets to support the heterogeneous needs of biomedical researchers.

In order to support the increasing need to share electronic health data for research purposes, various methods have been proposed for privacy preservation including k-anonymity. Many k-anonymity models provide the same level of anoymization regardless of practical need, which may decrease the utility of the dataset for a particular research study. In this study, we explore extensions to the k-anonymity algorithm that aim to satisfy the heterogeneous needs of different researchers while preserving privacy as well as utility of the dataset. The proposed algorithm, Attribute Utility Motivated k-anonymization (AUM), involves analyzing the characteristics of attributes and utilizing them to minimize information loss during the anonymization process. Through comparison with two existing algorithms, Mondrian and Incognito, preliminary results indicate that AUM may preserve more information from original datasets thus providing higher quality results with lower distortion.

[1]  Raymond Chi-Wing Wong,et al.  Anonymization by Local Recoding in Data with Attribute Hierarchical Taxonomies , 2008, IEEE Transactions on Knowledge and Data Engineering.

[2]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[4]  Jean-Pierre Corriveau,et al.  A globally optimal k-anonymity method for the de-identification of health data. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[5]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[6]  Jian Pei,et al.  A Survey of Utility-based Privacy-Preserving Data Transformation Methods , 2008, Privacy-Preserving Data Mining.

[7]  Khaled El Emam,et al.  Protecting privacy using k-anonymity. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[8]  Jun-Lin Lin,et al.  Genetic algorithm-based clustering approach for k-anonymization , 2009, Expert Syst. Appl..

[9]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[10]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[11]  R. Motwani,et al.  Efficient Algorithms for Masking and Finding Quasi-Identifiers , 2007 .

[12]  Rakesh Agrawal,et al.  Securing electronic health records without impeding the flow of information , 2007, Int. J. Medical Informatics.

[13]  T. Giordano,et al.  The Health Insurance Portability and Accountability Act of 1996 (HIPAA) privacy rule: implications for clinical research. , 2006, Annual review of medicine.

[14]  Joshua C Denny,et al.  Anonymization of administrative billing codes with repeated diagnoses through censoring. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[15]  N. Stanietsky,et al.  The interaction of TIGIT with PVR and PVRL2 inhibits human NK cell cytotoxicity , 2009, Proceedings of the National Academy of Sciences.

[16]  Joshua C. Denny,et al.  The disclosure of diagnosis codes can breach research participants' privacy , 2010, J. Am. Medical Informatics Assoc..

[17]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[18]  Latanya Sweeney,et al.  Datafly: A System for Providing Anonymity in Medical Data , 1997, DBSec.

[19]  Geraldine P Mineau,et al.  Biomedical databases: protecting privacy and promoting research. , 2003, Trends in biotechnology.

[20]  Chung-Chian Hsu,et al.  Incremental clustering of mixed data based on distance hierarchy , 2008, Expert Syst. Appl..

[21]  S. Meystre,et al.  Automatic de-identification of textual documents in the electronic health record: a review of recent research , 2010, BMC medical research methodology.

[22]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23]  B. Malin,et al.  Anonymization of electronic medical records for validating genome-wide association studies , 2010, Proceedings of the National Academy of Sciences.

[24]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[25]  Charles Safran,et al.  Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[26]  Catuscia Palamidessi,et al.  Probabilistic Anonymity , 2005, CONCUR.

[27]  Jae-Myung Kim,et al.  Implementation of Bitmap Based Incognito and Performance Evaluation , 2007, DASFAA.

[28]  Sushil Jajodia,et al.  How Anonymous Is k-Anonymous? Look at Your Quasi-ID , 2008, Secure Data Management.

[29]  Rajeev Krishna,et al.  Patient confidentiality in the research use of clinical medical databases. , 2007, American journal of public health.

[30]  Jules J. Berman,et al.  Confidentiality issues for medical data miners , 2002, Artif. Intell. Medicine.

[31]  Ankit Tandon,et al.  Balancing between data utility and privacy preservation in data mining , 2010 .

[32]  Jolene Galegher,et al.  The Health Insurance Portability and Accountability Act Privacy Rule: A Practical Guide for Researchers , 2004, Medical care.

[33]  Yu Fu,et al.  A privacy protection technique for publishing data mining models and research data , 2010, TMIS.

[34]  Elizabeth W. Staton,et al.  Practice-Based Research Network Studies in the Age of HIPAA , 2005, The Annals of Family Medicine.

[35]  Lucila Ohno-Machado,et al.  Protecting patient privacy by quantifiable control of disclosures in disseminated databases , 2004, Int. J. Medical Informatics.

[36]  Bradley Malin,et al.  Technical and Policy Approaches to Balancing Patient Privacy and Data Sharing in Clinical and Translational Research , 2010, Journal of Investigative Medicine.