Value Added Privacy Services for Healthcare Data

The widespread use of digital data, storage and sharing for data mining has given data snoopers a big opportunity to collect and match records from multiple sources for identity theft and other privacy-invasion activities. While most healthcare organizations do a good job in protecting their data in their databases, very few organizations take enough precautions to protect data that is shared with third party organizations. This data is vulnerable to data hackers, snoopers and rouge employees that want to take advantage of the situation. Only recently has the regulatory environment (like HIPAA) tightened the laws to enforce data and privacy protection. The goal of this project was to explore use of value added software services to counter this invasion of privacy problem when data is shared with an external organization for data mining, statistical analysis or other purposes. Specifically, the goal of this service is to protect data without removing sensitive/non-sensitive attributes. Sophisticated data masking algorithms are used in these services to intelligently perturb and swap data fields making it extremely difficult for data snoopers to reveal personal identity, even after linking records with other data sources. Our software service provides value added data analysis with the masked dataset. Dataset-level properties and statistics remain approximately the same after data masking; however, individual record-level values are changed or perturbed to confuse the data snoopers.

[1]  David W. Bates,et al.  Viewpoint Paper: A Research Agenda for Personal Health Records (PHRs) , 2008, J. Am. Medical Informatics Assoc..

[2]  Henryk Wozniakowski,et al.  The statistical security of a statistical database , 1984, TODS.

[3]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[4]  Jeffrey W. Seifert Data Mining and Homeland Security: An Overview , 2008 .

[5]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[6]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[7]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8]  Alfred Kobsa,et al.  Impacts of User Privacy Preferences on Personalized Systems , 2004, Designing Personalized User Experiences in eCommerce.

[9]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[10]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[11]  Steven P. Reiss Practical Data-Swapping: The First Steps , 1980, 1980 IEEE Symposium on Security and Privacy.

[12]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[13]  Chong K. Liew,et al.  A data distortion by probability distribution , 1985, TODS.