On learning cluster coefficient of private networks

Enabling accurate analysis of social network data while preserving differential privacy has been challenging since graph features such as clustering coefficient or modularity often have high sensitivity, which is different from traditional aggregate functions (e.g., count and sum) on tabular data. In this paper, we treat a graph statistics as a function f and develop a divide and conquer approach to enforce differential privacy. The basic procedure of this approach is to first decompose the target computation f into several less complex unit computations $$f_1,\ldots,f_m$$f1,…,fm connected by basic mathematical operations (e.g., addition, subtraction, multiplication, division), then perturb the output of each fi with Laplace noise derived from its own sensitivity value and the distributed privacy threshold $$\epsilon_i,$$ϵi, and finally combine those perturbed fi as the perturbed output of computation f. We examine how various operations affect the accuracy of complex computations. When unit computations have large global sensitivity values, we enforce the differential privacy by calibrating noise based on the smooth sensitivity, rather than the global sensitivity. By doing this, we achieve the strict differential privacy guarantee with smaller magnitude noise. We illustrate our approach using clustering coefficient, which is a popular statistics used in social network analysis. Empirical evaluations on five real social networks and various synthetic graphs generated from three random graph models show that the developed divide and conquer approach outperforms the direct approach.

[1]  Faraz Zaidi,et al.  Small world networks and clustered small world networks with random connectivity , 2012, Social Network Analysis and Mining.

[2]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[3]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[4]  Leting Wu,et al.  Differential Privacy Preserving Spectral Graph Analysis , 2013, PAKDD.

[5]  Wenliang Du,et al.  Privacy-MaxEnt: integrating background knowledge in privacy quantification , 2008, SIGMOD Conference.

[6]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[7]  Rathindra Sarathy,et al.  Differential Privacy for Numeric Data , 2009 .

[8]  Uriel Feige,et al.  Proceedings of the thirty-ninth annual ACM symposium on Theory of computing , 2007, STOC 2007.

[9]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[10]  Marianne Winslett,et al.  Differentially private data cubes: optimizing noise sources and consistency , 2011, SIGMOD '11.

[11]  Marco Elio Tabacchi,et al.  Facebook as a Small World: a topological hypothesis , 2012, Social Network Analysis and Mining.

[12]  Ilya Mironov,et al.  Differentially private recommender systems , 2009 .

[13]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[14]  Yang Xiang,et al.  On Learning Cluster Coefficient of Private Networks , 2012, ASONAM.

[15]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[16]  Xiaowei Ying,et al.  On Linear Refinement of Differential Privacy-Preserving Query Answering , 2013, PAKDD.

[17]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[18]  Ilya Mironov,et al.  Differentially private recommender systems: building privacy into the net , 2009, KDD.

[19]  Dan Suciu,et al.  Relationship privacy: output perturbation for queries with joins , 2009, PODS.

[20]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[21]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[22]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[23]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[24]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[25]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[26]  Chris Clifton,et al.  Differential identifiability , 2012, KDD.

[27]  Guy Melançon,et al.  Model for generating artificial social networks having community structures with small-world and scale-free properties , 2013, Social Network Analysis and Mining.

[28]  Cynthia Dwork,et al.  Differential Privacy for Statistics: What we Know and What we Want to Learn , 2010, J. Priv. Confidentiality.

[29]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[30]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[31]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[32]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[33]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[34]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[35]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[36]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[37]  Ashwin Machanavajjhala,et al.  Worst-Case Background Knowledge for Privacy-Preserving Data Publishing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[38]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[39]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[40]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[41]  David D. Jensen,et al.  Accurate Estimation of the Degree Distribution of Private Networks , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[42]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[43]  Johannes Gehrke,et al.  iReduct: differential privacy with reduced relative errors , 2011, SIGMOD '11.