Privacy preserving database application testing

Traditionally, application software developers carry out their tests on their own local development databases. However, such local databases usually have only a small number of sample data and hence cannot simulate satisfactorily a live environment, especially in terms of performance and scalability testing. On the other hand, the idea of testing applications over live production databases is increasingly problematic in most situations primarily due to the fact that such use of live production databases has the potential to expose sensitive data to an unauthorized tester and to incorrectly update information in the underlying database. In this paper, we investigate techniques to generate mock databases for application software testing without revealing any confidential information from the live production databases. Specifically, we will design mechanisms to create the deterministic rule set R, non-deterministic rule set N R, and statistic data set S for a live production database. We will then build a security Analyzer which will process the triplet <R',N R',S'> together with security requirements (security policy) and output a new triplet <R',N R',S'> The security Analyzer will guarantee that no confidential information could be inferred from the new triplet <R',N R',S'> The mock database generated from this new triplet can simulate the live environment for testing purpose, while maintaining the privacy of data in the original database.

[1]  Silvio Micali,et al.  Probabilistic Encryption , 1984, J. Comput. Syst. Sci..

[2]  Yücel Saygin,et al.  Privacy preserving association rule mining , 2002, Proceedings Twelfth International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems RIDE-2EC 2002.

[3]  Oded Goldreich,et al.  Foundations of Cryptography: Basic Tools , 2000 .

[4]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[5]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[6]  Yuval Ishai,et al.  Protecting data privacy in private information retrieval schemes , 1998, STOC '98.

[7]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[8]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[9]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[10]  Gio Wiederhold,et al.  Protecting inappropriate release of data from realistic databases , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[11]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  Josep Domingo-Ferrer,et al.  Current Directions in Statistical Data Protection , 2004 .

[13]  W. Winkler,et al.  MASKING MICRODATA FILES , 1995 .

[14]  Eyal Kushilevitz,et al.  Private information retrieval , 1998, JACM.

[15]  Ljiljana Brankovic,et al.  Data Swapping: Balancing Privacy against Precision in Mining for Logic Rules , 1999, DaWaK.

[16]  Phyllis G. Frankl,et al.  A framework for testing database applications , 2000, ISSTA '00.

[17]  Jay-J. Kim A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND , 2002 .

[18]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[19]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[20]  Richard E. Overill,et al.  Foundations of Cryptography: Basic Tools , 2002, J. Log. Comput..

[21]  L. Sweeney,et al.  Trail Re-Identification: Learning Who You Are From Where You Have Been , 2003 .

[22]  Arnaud Gotlieb,et al.  Automatic test data generation using constraint solving techniques , 1998, ISSTA '98.

[23]  Edward P. K. Tsang,et al.  Foundations of constraint satisfaction , 1993, Computation in cognitive science.

[24]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[25]  Silvio Micali,et al.  The knowledge complexity of interactive proof-systems , 1985, STOC '85.

[26]  Gio Wiederhold,et al.  Web Implementation of a Security Mediator for Medical Databases , 1997, DBSec.

[27]  Kenneth Baclawski,et al.  Quickly generating billion-record synthetic databases , 1994, SIGMOD '94.

[28]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[29]  Michael Stonebraker,et al.  The design of POSTGRES , 1986, SIGMOD '86.

[30]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[31]  Ljiljana Brankovic,et al.  PRIVACY ISSUES IN KNOWLEDGE DISCOVERY AND DATA MINING , 2000 .

[32]  C. J. Skinner,et al.  On identification disclosure and prediction disclosure for microdata , 1992 .