On Inference Control in Semantic Data Models for Statistical Databases

Abstract A statistical database (SDB) is a database that is used to provide simple. summary statistics about populations stored in the database and that supports statistical data analysis. When SDB users infer protected information in the SDB from responses to queries, we say that the SDB is compromised. The security problem of SDB is to allow simple summary statistics about protected information in the SDB while preventing compromise. In this paper, we consider the SDB security problem in the context of the Data Abstraction (D-A) model, and investigate the effectiveness of rounding SUM and COUNT query responses for preventing compromise due to structural, dynamic and pre-knowledge inferences in the generalization hierarchy of the D-A model. We first round only SUM query responses, permit updates in a single population, and investigate the effect of users' a priori knowledge of a single protected value. It is shown that compromise is possible, and a necessary and sufficient condition for compromise is given. We then round both SUM and COUNT query responses, and introduce techniques of choosing rounding bases for populations in tree-organized hierarchies that either eliminate or restrict structural and dynamic inferences. Finally, we propose the range response technique which eliminates structural and dynamic inferences in general generalization hierarchies.