Data Dependencies Preserving Shuffle in Relational Database

This paper addresses the problem that database shuffling algorithms do not preserve data dependencies. We introduce an approach for preserving functional dependencies and data-driven associations during database shuffle. We use Boyce-Codd Normal Form (BCNF) decomposition for preserving functional dependencies. Given a relation R that is not in BCNF form, we recommend to decompose R into BCNF relations R1, ..., Rn. Each Ri (i = 1, ...,n) is shuffled then rejoined to create the shuffled relation. Our approach guarantees losslessness and preserves functional dependencies. Data-driven associations may also be lost during database shuffling. For this, we generate the transitive closure of attributes that are associated. We require that the associated attributed are shuffled together. We also present our theoretical and empirical results.

[1]  Terry A. Halpin,et al.  Information modeling and relational databases (2. ed.) , 2008 .

[2]  Rathindra Sarathy,et al.  A theoretical basis for perturbation methods , 2003, Stat. Comput..

[3]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[4]  Henning Köhler,et al.  Finding Faithful Boyce-Codd Normal Form Decompositions , 2006, AAIM.

[5]  S. Stigler Francis Galton's Account of the Invention of Correlation , 1989 .

[6]  Sylvia L. Osborn Testing for Existence of a Covering Boyce-Codd normal Form , 1979, Inf. Process. Lett..

[7]  P. Barrett Structural equation modelling : Adjudging model fit , 2007 .

[8]  Rathindra Sarathy,et al.  Data Shuffling - A New Masking Approach for Numerical Data , 2006, Manag. Sci..

[9]  L. Bhar,et al.  MULTIVARIATE ANALYSIS OF VARIANCE , 2003 .

[10]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[11]  N. Nagaveni,et al.  Evaluation of a perturbation-based technique for privacy preservation in a multi-party clustering scenario , 2013, Inf. Sci..

[12]  Jianzhong Li,et al.  Privacy protection on sliding window of data streams , 2007, 2007 International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2007).

[13]  Durvasula V. L. N. Somayajulu,et al.  A Noise Addition Scheme in Decision Tree for Privacy Preserving Data Mining , 2010, ArXiv.

[14]  K. Pearson On the χ 2 Test of Goodness of Fit , 1922 .

[15]  Stphane Tuffry,et al.  Data Mining and Statistics for Decision Making , 2011 .

[16]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[17]  R. Fisher The conditions under which X squared measures the discrepancy between observation and hypothesis , 1924 .

[18]  Sheng Zhong,et al.  Privacy-Preserving Queries on Encrypted Data , 2006, ESORICS.

[19]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[21]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[22]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[23]  Chong K. Liew,et al.  A data distortion by probability distribution , 1985, TODS.

[24]  Rathindra Sarathy,et al.  The Security of Confidential Numerical Data in Databases , 2002, Inf. Syst. Res..

[25]  Mila E. Majster-Cederbaum,et al.  Ensuring the Existence of a BCNF-Decomposition that Preserves Functional Dependencies in O(N²) Time , 1992, Inf. Process. Lett..

[26]  William A. Arbaugh,et al.  Improving the TCPA Specification , 2002, Computer.

[27]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[28]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .