A cubic-wise balance approach for privacy preservation in data cubes

A data warehouse stores current and historical records consolidated from multiple transactional systems. Securing data warehouses is of ever-increasing interest, especially considering areas where data are sold in pieces to third parties for data mining practices. In this case, existing data warehouse security techniques, such as data access control, may not be easy to enforce and can be ineffective. Instead, this paper proposes a data perturbation based approach, called the cubic-wise balance method, to provide privacy preserving range queries on data cubes in a data warehouse. This approach is motivated by the following observation: analysts are usually interested in summary data rather than individual data values. Indeed, our approach can provide a closely estimated summary data for range queries without providing access to actual individual data values. As demonstrated by our experimental results on APB benchmark data set from the OLAP council, the cubic-wise balance method can achieve both better privacy preservation and better range query accuracy than random data perturbation alternatives.

[1]  Dorothy E. Denning,et al.  Secure statistical databases with random sample queries , 1980, TODS.

[2]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[3]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[4]  Sushil Jajodia,et al.  Cardinality-Based Inference Control in Sum-Only Data Cubes , 2002, ESORICS.

[5]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[6]  Leland L. Beck,et al.  A security machanism for statistical database , 1980, TODS.

[7]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[8]  Jeffrey F. Naughton,et al.  Caching multidimensional queries using chunks , 1998, SIGMOD '98.

[9]  Christos Faloutsos,et al.  Recovering Information from Summary Data , 1997, VLDB.

[10]  Peter J. Haas,et al.  The New Jersey Data Reduction Report , 1997 .

[11]  Richard J. Lipton,et al.  Secure databases: protection against user influence , 1979, TODS.

[12]  Peter J. Denning,et al.  The tracker: a threat to statistical database security , 1979, TODS.

[13]  B. Srinivasan Parallel searching in distributed databases , 1980, Comput. Networks.

[14]  Ralph Kimball,et al.  Hackers, crackers, and spooks: ensuring that your data warehouse is secure , 1997 .

[15]  Alok N. Choudhary,et al.  High Performance OLAP and Data Mining on Parallel Computers , 1997, Data Mining and Knowledge Discovery.

[16]  Günther Pernul,et al.  Towards OLAP security design — survey and research issues , 2000, DOLAP '00.

[17]  Ying Chen,et al.  Building large ROLAP data cubes in parallel , 2004, Proceedings. International Database Engineering and Applications Symposium, 2004. IDEAS '04..

[18]  Myoung-Ho Kim,et al.  An Efficient Processing of Range-MIN/MAX Queries over Data Cube , 1998, Inf. Sci..

[19]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[20]  Susanne E. Hambrusch,et al.  Parallelizing the Data Cube , 2001, ICDT.

[21]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[22]  Henryk Wozniakowski,et al.  The statistical security of a statistical database , 1984, TODS.

[23]  Steven P. Reiss Practical Data-Swapping: The First Steps , 1980, 1980 IEEE Symposium on Security and Privacy.

[24]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[25]  D. Horvitz,et al.  A Multi-Proportions Randomized Response Model , 1967 .

[26]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[27]  Chong K. Liew,et al.  A data distortion by probability distribution , 1985, TODS.

[28]  Mark Sullivan,et al.  Quasi-cubes: exploiting approximations in multidimensional databases , 1997, SGMD.