Cardinality-based inference control in data cubes

This paper addresses the inference problem in on-line analytical processing (OLAP) systems. The inference problem occurs when the exact values of sensitive attributes can be determined through answers to OLAP queries. Most existing inference control methods are computationally expensive for OLAP systems, because they ignore the special structures of OLAP queries. By exploiting such structures, we derive cardinality-based sufficient conditions for safe OLAP data cubes, Specifically, data cubes are safe, from inferences if their core cuboids are dense enough, in the sense that the number of known values is under a tight bound. We then apply the sufficient conditions on the basis of a three-tier inference control model. The model introduces an aggregation tier between data and queries. The aggregation tier represents a collection of safe data cubes that are pre-computed over a partition of the data using the proposed sufficient conditions. The aggregation tier is then used to provide users with inference-free queries. Our approach mitigates the performance penalty of inference control, because partitioning the data yields smaller input to inference control algorithms, pre-computing the aggregation tier reduces on-line delay, and using cardinality-based conditions guarantees linear-time complexity.

[1]  Ivan P. Fellegi,et al.  On the Question of Statistical Confidentiality , 1972 .

[2]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[3]  Marina Moscarini,et al.  Computational issues connected with the protection of sensitive statistics by auditing sum-queries , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[4]  Dorothy E. Denning,et al.  Inference Controls for Statistical Databases , 1983, Computer.

[5]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[6]  K. D. Ikramov Sparse matrices , 2020, Krylov Subspace Methods with Application in Incompressible Fluid Flow Solvers.

[7]  Jeffrey F. Naughton,et al.  Caching multidimensional queries using chunks , 1998, SIGMOD '98.

[8]  Graham Wrightson,et al.  Usability of compromise-free statistical databases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[9]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[10]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[11]  Leland L. Beck,et al.  A security machanism for statistical database , 1980, TODS.

[12]  L. Cox Suppression Methodology and Statistical Disclosure Control , 1980 .

[13]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[14]  Francesco M. Malvestuto,et al.  Auditing Sum Queries , 2003, ICDT.

[15]  Francis Y. L. Chin,et al.  Efficient Inference Control for Range SUM Queries , 1984, Theor. Comput. Sci..

[16]  Dorothy E. Denning,et al.  Secure statistical databases with random sample queries , 1980, TODS.

[17]  Jan Schlörer,et al.  Security of statistical databases: multidimensional transformation , 1980, TODS.

[18]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[19]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[20]  David Alan Hanson,et al.  Data security , 1979, ACM-SE 17.

[21]  Francis Y. L. Chin,et al.  Security problems on inference control for SUM, MAX, and MIN queries , 1986, JACM.

[22]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[23]  Jon M. Kleinberg,et al.  Auditing Boolean attributes , 2000, PODS.

[24]  Richard J. Lipton,et al.  Secure databases: protection against user influence , 1979, TODS.

[25]  Duminda Wijesekera,et al.  Constraints, Inference Channels and Secure Databases , 2000, CP.

[26]  Dorothy E. Denning,et al.  Cryptography and Data Security , 1982 .

[27]  Gultekin Özsoyoglu,et al.  Statistical database design , 1981, TODS.

[28]  P. Y. Chin,et al.  Security is partitioned dynamic stastical databases , 1979, COMPSAC.

[29]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[30]  Sushil Jajodia,et al.  Cardinality-Based Inference Control in Sum-Only Data Cubes , 2002, ESORICS.

[31]  Gultekin Özsoyoglu,et al.  Auditing and Inference Control in Statistical Databases , 1982, IEEE Transactions on Software Engineering.

[32]  Henryk Wozniakowski,et al.  The statistical security of a statistical database , 1984, TODS.

[33]  Laura Zayatz SDC in the 2000 U.S. Decennial Census , 2002, Inference Control in Statistical Databases.

[34]  Sushil Jajodia,et al.  Secure Databases: Constraints, Inference Channels, and Monitoring Disclosures , 2000, IEEE Trans. Knowl. Data Eng..

[35]  W. Greub Linear Algebra , 1981 .

[36]  Xintao Wu,et al.  Using approximations to scale exploratory data analysis in datacubes , 1999, KDD '99.

[37]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[38]  Ramayya Krishnan,et al.  Cell suppression to limit content-based disclosure , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[39]  Sushil Jajodia,et al.  Auditing Interval-Based Inference , 2002, CAiSE.

[40]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.