The risk of node re-identification in labeled social graphs

Real network datasets provide significant benefits for understanding phenomena such as information diffusion or network evolution. Yet the privacy risks raised from sharing real graph datasets, even when stripped of user identity information, are significant. When nodes have associated attributes, the privacy risks increase. In this paper we quantitatively study the impact of binary node attributes on node privacy by employing machine-learning-based re-identification attacks and exploring the interplay between graph topology and attribute placement. We also analyze the risk of anonymity over epidemic networks subject to different node re-identification attacks. Our experiments show that the population’s diversity on the binary attribute consistently degrades anonymity. More interestingly, we show that similar diverse populations in the SI epidemic model maintain different levels of anonymity with different infection rates.

[1]  Sándor Imre,et al.  An Efficient and Robust Social Network De-anonymization Attack , 2016, WPES@CCS.

[2]  Philip S. Yu,et al.  On the Hardness of Graph Anonymization , 2011, 2011 IEEE 11th International Conference on Data Mining.

[3]  Matthias Grossglauser,et al.  A Bayesian method for matching two similar graphs without seeds , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[4]  Martina Morris,et al.  Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects. , 2008, Journal of statistical software.

[5]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[6]  Michael Hicks,et al.  Deanonymizing mobility traces: using social network as a side-channel , 2012, CCS.

[7]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[8]  Martina Morris,et al.  Software Tools for the Statistical Analysis of Network Data , 2015 .

[9]  Christos Faloutsos,et al.  It's who you know: graph mining using recursive structural features , 2011, KDD.

[10]  Peter J. Haas,et al.  Data-Stream Sampling: Basic Techniques and Results , 2016, Data Stream Management.

[11]  Shlomo Havlin,et al.  Assortativity and leadership emerge from anti-preferential attachment in heterogeneous networks , 2015, Scientific Reports.

[12]  Xiang-Yang Li,et al.  De-anonymizing social networks and inferring private attributes using knowledge graphs , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[13]  Shouling Ji,et al.  General Graph Data De-Anonymization , 2016, ACM Trans. Inf. Syst. Secur..

[14]  Shouling Ji,et al.  Structural Data De-Anonymization: Theory and Practice , 2016, IEEE/ACM Transactions on Networking.

[15]  Kumar Sharad,et al.  Learning to de-anonymize social networks , 2016 .

[16]  Kumar Sharad,et al.  True Friends Let You Down: Benchmarking Social Graph Anonymization Schemes , 2016, AISec@CCS.

[17]  George Danezis,et al.  An Automated Social Graph De-anonymization Technique , 2014, WPES.

[18]  Prateek Mittal,et al.  De-SAG: On the De-Anonymization of Structure-Attribute Graph Data , 2019, IEEE Transactions on Dependable and Secure Computing.

[19]  Markus Jakobsson,et al.  Messin' with Texas Deriving Mother's Maiden Names Using Public Records , 2005, ACNS.

[20]  Ling Huang,et al.  Joint Link Prediction and Attribute Inference Using a Social-Attribute Network , 2014, TIST.

[21]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[22]  Yong-Yeol Ahn,et al.  Community-Enhanced De-anonymization of Online Social Networks , 2014, CCS.

[23]  Prateek Mittal,et al.  On Your Social Network De-anonymizablity: Quantification and Large Scale Evaluation with Seed Knowledge , 2015, NDSS.

[24]  Matei Ripeanu,et al.  Cheating in Online Games: A Social Network Perspective , 2014, TOIT.

[25]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[26]  Prateek Mittal,et al.  LinkMirage: Enabling Privacy-preserving Analytics on Social Relationships , 2016, NDSS.

[27]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[28]  Martina Morris,et al.  statnet: Software Tools for the Representation, Visualization, Analysis and Simulation of Network Data. , 2008, Journal of statistical software.

[29]  Prateek Mittal,et al.  On the relative de-anonymizability of graph data: Quantification and evaluation , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[30]  John Skvoretz,et al.  Diversity, Integration, and Social Ties: Attraction versus Repulsion as Drivers of Intra- and Intergroup Relations1 , 2013, American Journal of Sociology.

[31]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[32]  Prateek Mittal,et al.  Graph Data Anonymization, De-Anonymization Attacks, and De-Anonymizability Quantification: A Survey , 2017, IEEE Communications Surveys & Tutorials.

[33]  L. Takac DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS , 2012 .

[34]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[35]  Martina Morris,et al.  ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks. , 2008, Journal of statistical software.

[36]  Matthias Grossglauser,et al.  On the performance of percolation graph matching , 2013, COSN '13.

[37]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp , 1996 .

[38]  Shouling Ji,et al.  Structural Data De-anonymization: Quantification, Practice, and Implications , 2014, CCS.

[39]  Elaine Shi,et al.  Link prediction by de-anonymization: How We Won the Kaggle Social Network Challenge , 2011, The 2011 International Joint Conference on Neural Networks.

[40]  Ben Y. Zhao,et al.  Sharing graphs using differentially private graph models , 2011, IMC '11.

[41]  W. O. Kermack,et al.  A contribution to the mathematical theory of epidemics , 1927 .

[42]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[43]  P. Pattison LOGIT MODELS AND LOGISTIC REGRESSIONS FOR SOCIAL NETWORKS: I. AN INTRODUCTION TO MARKOV GRAPHS AND p* STANLEY WASSERMAN UNIVERSITY OF ILLINOIS , 1996 .

[44]  Adriana Iamnitchi,et al.  The risk of node re-identification in labeled social graphs , 2018, Applied Network Science.

[45]  Silvio Lattanzi,et al.  An efficient reconciliation algorithm for social networks , 2013, Proc. VLDB Endow..

[46]  Matthias Grossglauser,et al.  On the privacy of anonymized networks , 2011, KDD.

[47]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[48]  Shouling Ji,et al.  Structure Based Data De-Anonymization of Social Networks and Mobility Traces , 2014, ISC.

[49]  K. Liu,et al.  Towards identity anonymization on graphs , 2008, SIGMOD Conference.

[50]  David W. Aha,et al.  Labels or attributes?: rethinking the neighbors for collective classification in sparsely-labeled networks , 2013, CIKM.