The Security of Confidential Numerical Data in Databases

Organizations are storing large amounts of data in databases for data mining and other types of analysis. Some of this data is considered confidential and has to be protected from disclosure. When access to individual values of confidential numerical data in the database is prevented, disclosure may occur when a snooper uses linear models to predict individual values of confidential attributes using nonconfidential numerical and categorical attributes. Hence, it is important for the database administrator to have the ability to evaluate security for snoopers using linear models. In this study we provide a methodology based on Canonical Correlation Analysis that is both appropriate and adequate for evaluating security. The methodology can also be used to evaluate the security provided by different security mechanisms such as query restrictions and data perturbation. In situations where the level of security is inadequate, the methodology provided in this study can also be used to select appropriate inference control mechanisms. The application of the methodology is illustrated using a simulated database.

[1]  P. Tendick Optimal noise addition for preserving confidentiality in multivariate data , 1991 .

[2]  Gultekin Özsoyoglu,et al.  Auditing and Inference Control in Statistical Databases , 1982, IEEE Transactions on Software Engineering.

[3]  Paulo B. Góes,et al.  Interval Protection of Confidential Information in a Database , 1998, INFORMS J. Comput..

[4]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[5]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[6]  Rathindra Sarathy,et al.  A General Additive Data Perturbation Method for Database Security , 1999 .

[7]  Ramayya Krishnan,et al.  Disclosure Detection in Multivariate Categorical Databases: Auditing Confidentiality Protection Through Two New Matrix Operators , 1999 .

[8]  George T. Duncan,et al.  Optimal Disclosure Limitation Strategy in Statistical Databases: Deterring Tracker Attacks through Additive Noise , 2000 .

[9]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[10]  Jeffrey S. Simonoff,et al.  The use of regression methodology for the compromise of confidential information in statistical databases , 1987, TODS.

[11]  Dinesh Batra,et al.  Accessibility, security, and accuracy in statistical databases: the case for the multiplicative fixed data perturbation approach , 1995 .

[12]  George T. Duncan,et al.  Enhancing Access to Microdata while Protecting Confidentiality: Prospects for the Future , 1991 .