Auditing sum-queries to make a statistical database secure

In response to queries asked to a statistical database, the query system should avoid releasing summary statistics that could lead to the disclosure of confidential individual data. Attacks to the security of a statistical database may be direct or indirect and, in order to repel them, the query system should audit queries by controlling the amount of information released by their responses. This paper focuses on sum-queries with a response variable of nonnegative real type and proposes a compact representation of answered sum-queries, called an information model in “normal form,” which allows the query system to decide whether the value of a new sum-query can or cannot be safely answered. If it cannot, then the query system will issue the range of feasible values of the new sum-query consistent with previously answered sum-queries. Both the management of the information model and the answering procedure require solving linear-programming problems and, since standard linear-programming algorithms are not polynomially bounded (despite their good performances in practice), effective procedures that make a parsimonious use of them are stated for the general case. Moreover, in the special case that the information model is “graphical.” It is shown that the answering procedure can be implemented in polynomial time.

[1]  Francesco M. Malvestuto,et al.  Privacy Preserving and Data Mining in an On-Line Statistical Database of Additive Type , 2004, Privacy in Statistical Databases.

[2]  Francesco M. Malvestuto,et al.  A universal-scheme approach to statistical databases containing homogeneous summary tables , 1993, TODS.

[3]  Meng Chang Chen,et al.  A Model of Summary Data and its Applications in Statistical Databases , 1988, SSDBM.

[4]  Dan Gusfield,et al.  A Graph Theoretic Approach to Statistical Data Security , 1988, SIAM J. Comput..

[5]  Yuval Rabani,et al.  Linear Programming , 2007, Handbook of Approximation Algorithms and Metaheuristics.

[6]  Jianer Chen,et al.  Cardinality-based inference control in OLAP systems: an information theoretic approach , 2004, DOLAP '04.

[7]  Ramayya Krishnan,et al.  Disclosure Limitation Methods and Information Loss for Tabular Data , 2001 .

[8]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[9]  Sushil Jajodia,et al.  Cardinality-Based Inference Control in Sum-Only Data Cubes , 2002, ESORICS.

[10]  Gultekin Özsoyoglu,et al.  Auditing and Inference Control in Statistical Databases , 1982, IEEE Transactions on Software Engineering.

[11]  Francis Y. L. Chin,et al.  Security problems on inference control for SUM, MAX, and MIN queries , 1986, JACM.

[12]  Francesco M. Malvestuto,et al.  A Linear Algorithm for Finding the Invariant Edges of an Edge-Weighted Graph , 2002, SIAM J. Comput..

[13]  Marina Moscarini,et al.  Query Evaluability in Statistical Databases , 1990, IEEE Trans. Knowl. Data Eng..

[14]  Jon M. Kleinberg,et al.  Auditing Boolean attributes , 2000, PODS.

[15]  Meng Chang Chen,et al.  On the Data Model and Access Method of Summary Data Management , 1989, IEEE Trans. Knowl. Data Eng..

[16]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[17]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[18]  L. Cox Suppression Methodology and Statistical Disclosure Control , 1980 .

[19]  Francesco M. Malvestuto,et al.  Auditing Sum Queries , 2003, ICDT.

[20]  Ravi Sandhu,et al.  ACM Transactions on Information and System Security: Editorial , 2005 .

[21]  David Maier,et al.  The Theory of Relational Databases , 1983 .

[22]  Sushil Jajodia,et al.  Cardinality-based inference control in data cubes , 2004, J. Comput. Secur..

[23]  Marina Moscarini,et al.  Privacy in Multidimensional Databases , 2003, Multidimensional Databases.

[24]  S E Fienberg,et al.  INAUGURAL ARTICLE by a Recently Elected Academy Member:Bounds for cell entries in contingency tables given marginal totals and decomposable graphs , 2000 .

[25]  Francesco M. Malvestuto,et al.  On the hardness of protecting sensitive information in a statistical database , 2001 .