A Novel Privacy Preserving Framework for Large Scale Graph Data Publishing

The need to efficiently store and query large scale graph datasets is evident in the growing number of data-intensive applications, particularly to maximize the mining of intelligence from these data (e.g., to inform decision making). However, directly releasing graph dataset for analysis may leak sensitive information of an individual even if the graph is anonymized, as demonstrated by the re-identification attacks on the DBpedia datasets. A key challenge in the design of graph sanitization methods is scalability, as existing execution models generally have significant memory requirements. In this paper, we propose a novel <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives><mml:math><mml:mi>k</mml:mi></mml:math><inline-graphic xlink:href="ding-ieq1-2931903.gif"/></alternatives></inline-formula>-decomposition algorithm and define a new information loss matrix designed for utility measurement in massively large graph datasets. We also propose a novel privacy preserving framework that can be seamlessly integrated with graph storage, anonymization, query processing, and analysis. Our experimental studies show that the proposed solution achieves privacy-preserving, utility, and efficiency.

[1]  Michael Hicks,et al.  Deanonymizing mobility traces: using social network as a side-channel , 2012, CCS.

[2]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[3]  Pradeep Dubey,et al.  Improving graph partitioning for modern graphs and architectures , 2015, IA3@SC.

[4]  Jordi Herrera-Joancomartí,et al.  A survey of graph-modification techniques for privacy-preserving on networks , 2016, Artificial Intelligence Review.

[5]  Xiang Cheng,et al.  Differentially private frequent subgraph mining , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[6]  Yücel Saygin,et al.  Privacy-Preserving Publishing of Hierarchical Data , 2016, ACM Trans. Priv. Secur..

[7]  Xiaowei Ying,et al.  Randomizing Social Networks: a Spectrum Preserving Approach , 2008, SDM.

[8]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[9]  K. Liu,et al.  Towards identity anonymization on graphs , 2008, SIGMOD Conference.

[10]  Ting Yu,et al.  Publishing Attributed Social Graphs with Formal Privacy Guarantees , 2016, SIGMOD Conference.

[11]  Jun Yang,et al.  Privacy beyond sensitive values , 2014, Science China Information Sciences.

[12]  Danfeng Yao,et al.  The union-split algorithm and cluster-based anonymization of social networks , 2009, ASIACCS '09.

[13]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[14]  Philip S. Yu,et al.  Personalized Privacy Protection in Social Networks , 2010, Proc. VLDB Endow..

[15]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[16]  Ninghui Li,et al.  Publishing Graph Degree Distribution with Node Differential Privacy , 2016, SIGMOD Conference.

[17]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[18]  Jian Pei,et al.  Preserving Privacy in Social Networks Against Neighborhood Attacks , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[19]  Jemal H. Abawajy,et al.  Privacy Preserving Social Network Data Publication , 2016, IEEE Communications Surveys & Tutorials.

[20]  Hai Jin,et al.  Privacy-Preserving Triangle Counting in Large Graphs , 2018, CIKM.

[21]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[22]  Lei Zou,et al.  K-Automorphism: A General Framework For Privacy Preserving Network Publication , 2009, Proc. VLDB Endow..

[23]  George Kollios,et al.  GRECS: Graph Encryption for Approximate Shortest Distance Queries , 2015, IACR Cryptol. ePrint Arch..

[24]  Jia Liu,et al.  K-isomorphism: privacy preserving network publication against structural attacks , 2010, SIGMOD Conference.

[25]  G. Mohankumar,et al.  PRIVACY-PRESERVING MULTI-KEYWORD TOP-K SIMILARITY SEARCH OVER ENCRYPTED DATA , 2018 .

[26]  Michael Backes,et al.  Membership Privacy in MicroRNA-based Studies , 2016, CCS.

[27]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[28]  Yong-Yeol Ahn,et al.  Community-Enhanced De-anonymization of Online Social Networks , 2014, CCS.

[29]  P. Foggia,et al.  Performance evaluation of the VF graph matching algorithm , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[30]  Claude Castelluccia,et al.  Differentially private sequential data publication via variable-length n-grams , 2012, CCS.

[31]  Fang Liu,et al.  Generalized Gaussian Mechanism for Differential Privacy , 2016, IEEE Transactions on Knowledge and Data Engineering.

[32]  Donald F. Towsley,et al.  Resisting structural re-identification in anonymized social networks , 2010, The VLDB Journal.

[33]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[34]  Xiao Lu,et al.  Real-Time and Spatio-Temporal Crowd-Sourced Social Network Data Publishing with Differential Privacy , 2018, IEEE Transactions on Dependable and Secure Computing.

[35]  Mario Vento,et al.  An Improved Algorithm for Matching Large Graphs , 2001 .

[36]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[37]  Prateek Mittal,et al.  On Your Social Network De-anonymizablity: Quantification and Large Scale Evaluation with Seed Knowledge , 2015, NDSS.

[38]  Lei Zou,et al.  Privacy Preserving Subgraph Matching on Large Graphs in Cloud , 2016, SIGMOD Conference.

[39]  Prateek Mittal,et al.  SmartWalk: Enhancing Social Network Security via Adaptive Random Walks , 2016, CCS.

[40]  Balachander Krishnamurthy,et al.  Class-based graph anonymization for social network data , 2009, Proc. VLDB Endow..

[41]  George Karypis,et al.  Multi-threaded Graph Partitioning , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[42]  Sharon Goldberg,et al.  Calibrating Data to Sensitivity in Private Data Analysis , 2012, Proc. VLDB Endow..

[43]  Dawn Xiaodong Song,et al.  Preserving Link Privacy in Social Network Based Systems , 2012, NDSS.