Preserving confidentiality when sharing medical database with the Cellsecu system

We propose a computer system called Cellsecu that maintains the anonymity and the confidentiality of each cell containing sensitive information in medical database. Cellsecu attains this by automatically removing, generalizing, and expanding information. It is designed to enhance data privacy protection so a data warehouse can automatically handle queries. In most cases, health organizations collect medical data with explicit identifiers, such as name, address and phone number. Simply removing all explicit identifiers prior to release of the data is not enough to preserve the data confidentiality. Remaining data can be used to re-identify individuals by linking or matching the data to other database, or by looking at unique characteristics found in the database. A formal model based on Modal logic is the theoretical foundation of Cellsecu. As well, a new confidentiality criteria called "non-uniqueness" is defined and implemented. We believe modeling this problem formally can clarify the issue as well as clearly identify the boundary of current technology. Base on our preliminary performance evaluation, the confidentiality check module and the confidentiality enhancing module only slightly degrade system performance.

[1]  Selim G. Akl,et al.  Views for Multilevel Database Security , 1986, 1986 IEEE Symposium on Security and Privacy.

[2]  Yi-Ting Chiang,et al.  How Much Privacy? - A System to Safe Guard Personal Privacy while Releasing Databases , 2002, Rough Sets and Current Trends in Computing.

[3]  Latanya Sweeney,et al.  Guaranteeing anonymity when sharing medical data, the Datafly System , 1997, AMIA.

[4]  Richard Spencer-Smith,et al.  Modal Logic , 2007 .

[5]  Thomas H. Hinke,et al.  Inference aggregation detection in database management systems , 1988, Proceedings. 1988 IEEE Symposium on Security and Privacy.

[6]  James J. Alpigini,et al.  Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing , 2002 .

[7]  Zbigniew W. Ras,et al.  Methodologies for Intelligent Systems , 1991, Lecture Notes in Computer Science.

[8]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[9]  T. B. Jabine,et al.  Access to social security microdata files for research and statistical purposes. , 1978, Social security bulletin.

[10]  Ronald Fagin,et al.  Reasoning about knowledge , 1995 .

[11]  Rohit Parikh,et al.  Probabilistic knowledge and probabilistic common knowledge , 1991 .

[12]  Matthew Morgenstern,et al.  Controlling logical inference in multilevel database systems , 1988, Proceedings. 1988 IEEE Symposium on Security and Privacy.

[13]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[14]  Tsan-sheng Hsu,et al.  A Logical Model for Privacy Protection , 2001, ISC.

[15]  M. de Rijke,et al.  Modal Logic , 2001, Cambridge Tracts in Theoretical Computer Science.

[16]  Alexander La,et al.  Access to social security microdata files for research and statistical purposes. , 1978, Social security bulletin.

[17]  Joseph Y. Halpern,et al.  Knowledge, Probability, and Adversaries (Preliminary Report) , 1993 .

[18]  Tsan-sheng Hsu,et al.  Quantifying Privacy Leakage through Answering Database Queries , 2002, ISC.

[19]  Michael A. Palley Security of statistical databases compromise through attribute correlational modeling , 1986, 1986 IEEE Second International Conference on Data Engineering.