Structural Predictability Optimization Against Inference Attacks in Data Publishing

Graphs have been proved to be a useful mathematical representation for a broad variety of real-world complex systems, and the structure prediction on graphs refers to estimating the potential relationship between the objects from the observed structures, being fundamental in many data analysis applications, such as network alignment, network reconstruction, and link prediction. Accordingly, in data publishing, it is necessary to regulate the structural predictability of graphs against inference attack to protect the sensitive information of the data generators. In contrast to the existing works about graph structure perturbation for node ranking, information diffusion, and so on, the structural predictability optimization problem, i.e., reducing the accuracy of sensitive relationships inference in graphs, has not been extensively studied. This paper presents an active learning algorithm that selects the most representative links to be perturbed, thus regulating the structural predictability of graphs, that is, removing as few as possible links to undermine the regularity level of graphs, which forms the foundation of inference attack methods. Specifically, with the assumption that the substructure with higher regularity level contains more regular equivalence components and has more equivalent paths supplied for the random walk processes, random walk-based link importance measuring algorithm is proposed to identify the representative links. The structural regularity metric, measuring the structural predictability of graphs, is also introduced to guide the link perturbation for structural predictability optimization. The extensive experiments on artificial and real-world data sets demonstrate the effectiveness of the proposed structural predictability optimization method. Specifically, the method can learn the role of links accurately in term of graph organization, and the performance of structure inference on graphs can be deteriorated effectively by representative link-based perturbation.

[1]  D. Baird,et al.  Assessment of spatial and temporal variability in ecosystem attributes of the St Marks national wildlife refuge, Apalachee bay, Florida , 1998 .

[2]  An Zeng,et al.  Optimizing Online Social Networks for Information Propagation , 2014, PloS one.

[3]  Philip S. Yu,et al.  Differentially Private Data Publishing and Analysis: A Survey , 2017, IEEE Transactions on Knowledge and Data Engineering.

[4]  Siddharth Srivastava,et al.  Anonymizing Social Networks , 2007 .

[5]  Dawei Zhao,et al.  Immunity of multiplex networks via acquaintance vaccination , 2015 .

[6]  Xiaowei Ying,et al.  Randomizing Social Networks: a Spectrum Preserving Approach , 2008, SDM.

[7]  Shahin Mohammadi,et al.  Low Rank Spectral Network Alignment , 2018, WWW.

[8]  Shui Yu,et al.  Big Privacy: Challenges and Opportunities of Privacy Study in the Age of Big Data , 2016, IEEE Access.

[9]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[10]  Lise Getoor,et al.  Preserving the Privacy of Sensitive Relationships in Graph Data , 2007, PinKDD.

[11]  Sándor Imre,et al.  An Efficient and Robust Social Network De-anonymization Attack , 2016, WPES@CCS.

[12]  Linyuan Lü,et al.  Toward link predictability of complex networks , 2015, Proceedings of the National Academy of Sciences.

[13]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[14]  Yang Zhang,et al.  CTRL+Z: Recovering Anonymized Social Graphs , 2017, ArXiv.

[15]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[16]  Marco Gonzalez,et al.  Tastes, ties, and time: A new social network dataset using Facebook.com , 2008, Soc. Networks.

[17]  Vasileios Nakos,et al.  Private Link Prediction in Social Networks , 2014 .

[18]  Jure Leskovec,et al.  The Network Completion Problem: Inferring Missing Nodes and Edges in Networks , 2011, SDM.

[19]  Filippo Menczer,et al.  Virality Prediction and Community Structure in Social Networks , 2013, Scientific Reports.

[20]  Wei Wang,et al.  Critical phenomena of information spreading dynamics on networks with cliques , 2018, Physical Review E.

[21]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Andrew McCallum,et al.  Toward Optimal Active Learning through Monte Carlo Estimation of Error Reduction , 2001, ICML 2001.

[23]  Hui Li,et al.  Different strategies for differentially private histogram publication , 2017, Journal of Communications and Information Networks.

[24]  Christopher Krügel,et al.  A Practical Attack to De-anonymize Social Network Users , 2010, 2010 IEEE Symposium on Security and Privacy.

[25]  D. Newth,et al.  Optimizing complex networks for resilience against cascading failure , 2007 .

[26]  Tao Zhou,et al.  Predicting missing links and identifying spurious links via likelihood analysis , 2016, Scientific Reports.

[27]  Lior Rokach,et al.  Privacy-preserving data mining: A feature set partitioning approach , 2010, Inf. Sci..

[28]  Hongyuan Zha,et al.  A Short Survey of Recent Advances in Graph Matching , 2016, ICMR.

[29]  Leting Wu,et al.  Reconstruction from Randomized Graph via Low Rank Approximation , 2010, SDM.

[30]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[31]  Wenbo He,et al.  A Tale of Three Social Networks: User Activity Comparisons across Facebook, Twitter, and Foursquare , 2014, IEEE Internet Computing.

[32]  Ryan A. Rossi,et al.  Role Discovery in Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[33]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[34]  Christophe Diot,et al.  Impact of Human Mobility on Opportunistic Forwarding Algorithms , 2007, IEEE Transactions on Mobile Computing.

[35]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[36]  Bo Wang,et al.  Network enhancement as a general method to denoise weighted biological networks , 2018, Nature Communications.

[37]  M. Newman,et al.  Renormalization Group Analysis of the Small-World Network Model , 1999, cond-mat/9903357.

[38]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[39]  Alexis Papadimitriou,et al.  Edge betweenness centrality: A novel algorithm for QoS-based topology control over wireless sensor networks , 2012, J. Netw. Comput. Appl..

[40]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Bin Wang,et al.  A secure K-automorphism privacy preserving approach with high data utility in social networks , 2014, Secur. Commun. Networks.

[42]  Xiaowei Ying,et al.  On link privacy in randomizing social networks , 2010, Knowledge and Information Systems.

[43]  B. Snel,et al.  Comparative assessment of large-scale data sets of protein–protein interactions , 2002, Nature.

[44]  Ke Wang,et al.  Neighborhood randomization for link privacy in social network analysis , 2013, World Wide Web.

[45]  Sabeur Aridhi,et al.  Big Graph Mining: Frameworks and Techniques , 2016, Big Data Res..

[46]  Hong Cheng,et al.  Link prediction via matrix completion , 2016, ArXiv.

[47]  Yi Yang,et al.  Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization , 2015, International Journal of Computer Vision.

[48]  Meng Cai,et al.  Social contagions on correlated multiplex networks , 2017, Physica A: Statistical Mechanics and its Applications.

[49]  Ming Tang,et al.  Improving the accuracy of the k-shell method by removing redundant links: From a perspective of spreading dynamics , 2015, Scientific Reports.

[50]  Xin Lu,et al.  Efficient network disintegration under incomplete information: the comic effect of link prediction , 2016, Scientific reports.

[51]  Ye Yuan,et al.  Link prediction via linear optimization , 2018, Physica A: Statistical Mechanics and its Applications.

[52]  Xiaoyong Du,et al.  Structure Based User Identification across Social Networks , 2018, IEEE Transactions on Knowledge and Data Engineering.

[53]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[54]  Jemal H. Abawajy,et al.  Privacy Preserving Social Network Data Publication , 2016, IEEE Communications Surveys & Tutorials.

[55]  Graham Cormode,et al.  Node Classification in Social Networks , 2011, Social Network Data Analytics.

[56]  Pablo M. Gleiser,et al.  Community Structure in Jazz , 2003, Adv. Complex Syst..

[57]  Jian Pei,et al.  The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks , 2011, Knowledge and Information Systems.

[58]  Charu C. Aggarwal,et al.  Node Classification in Signed Social Networks , 2016, SDM.

[59]  Xu Han,et al.  Cross-Bucket Generalization for Information and Privacy Preservation , 2018, IEEE Transactions on Knowledge and Data Engineering.

[60]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[61]  Roger Guimerà,et al.  Missing and spurious interactions and the reconstruction of complex networks , 2009, Proceedings of the National Academy of Sciences.

[62]  Yong-Yeol Ahn,et al.  Community-Enhanced De-anonymization of Online Social Networks , 2014, CCS.

[63]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[64]  Lior Rokach,et al.  Links Reconstruction Attack , 2013 .

[65]  Evimaria Terzi,et al.  Reconstructing Randomized Social Networks , 2010, SDM.

[66]  Jordi Herrera-Joancomartí,et al.  A survey of graph-modification techniques for privacy-preserving on networks , 2016, Artificial Intelligence Review.

[67]  Chun Chen,et al.  Active Learning Based on Locally Linear Reconstruction , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Reza Zafarani,et al.  User Identity Linkage across Online Social Networks: A Review , 2017, SKDD.

[69]  Zhihong Zhou,et al.  Understanding structure-based social network de-anonymization techniques via empirical analysis , 2018, EURASIP J. Wirel. Commun. Netw..

[70]  Tao Wu,et al.  Network Reconstruction and Controlling Based on Structural Regularity Analysis , 2018, 1805.07746.