Correlated Differential Privacy: Hiding Information in Non-IID Data Set

Privacy preserving on data mining and data release has attracted an increasing research interest over a number of decades. Differential privacy is one influential privacy notion that offers a rigorous and provable privacy guarantee for data mining and data release. Existing studies on differential privacy assume that in a data set, records are sampled independently. However, in real-world applications, records in a data set are rarely independent. The relationships among records are referred to as correlated information and the data set is defined as correlated data set. A differential privacy technique performed on a correlated data set will disclose more information than expected, and this is a serious privacy violation. Although recent research was concerned with this new privacy violation, it still calls for a solid solution for the correlated data set. Moreover, how to decrease the large amount of noise incurred via differential privacy in correlated data set is yet to be explored. To fill the gap, this paper proposes an effective correlated differential privacy solution by defining the correlated sensitivity and designing a correlated data releasing mechanism. With consideration of the correlated levels between records, the proposed correlated sensitivity can significantly decrease the noise compared with traditional global sensitivity. The correlated data releasing mechanism correlated iteration mechanism is designed based on an iterative method to answer a large number of queries. Compared with the traditional method, the proposed correlated differential privacy solution enhances the privacy guarantee for a correlated data set with less accuracy cost. Experimental results show that the proposed solution outperforms traditional differential privacy in terms of mean square error on large group of queries. This also suggests the correlated differential privacy can successfully retain the utility while preserving the privacy.

[1]  Longbing Cao,et al.  Non-IIDness Learning in Behavioral and Social Data , 2014, Comput. J..

[2]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[3]  Xindong Wu,et al.  Coupled behavior analysis for capturing coupling relationships in group-based market manipulations , 2012, KDD.

[4]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[5]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[6]  Aaron Roth,et al.  Iterative Constructions and Private Data Release , 2011, TCC.

[7]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[8]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[9]  Zhi-Hua Zhou,et al.  Multi-instance learning by treating instances as non-I.I.D. samples , 2008, ICML '09.

[10]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[11]  Daniel A. Spielman,et al.  Spectral Graph Theory and its Applications , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[12]  Philip S. Yu,et al.  Correlated network data publication via differential privacy , 2013, The VLDB Journal.

[13]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[14]  Ashwin Machanavajjhala,et al.  Pufferfish , 2014, ACM Trans. Database Syst..

[15]  Philip S. Yu,et al.  Coupled Behavior Analysis with Applications , 2012, IEEE Transactions on Knowledge and Data Engineering.

[16]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[17]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[18]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).