Attack Tolerance of Link Prediction Algorithms: How to Hide Your Relations in a Social Network

Link prediction is one of the fundamental research problems in network analysis. Intuitively, it involves identifying the edges that are most likely to be added to a given network, or the edges that appear to be missing from the network when in fact they are present. Various algorithms have been proposed to solve this problem over the past decades. For all their benefits, such algorithms raise serious privacy concerns, as they could be used to expose a connection between two individuals who wish to keep their relationship private. With this in mind, we investigate the ability of such individuals to evade link prediction algorithms. More precisely, we study their ability to strategically alter their connections so as to increase the probability that some of their connections remain unidentified by link prediction algorithms. We formalize this question as an optimization problem, and prove that finding an optimal solution is NP-complete. Despite this hardness, we show that the situation is not bleak in practice. In particular, we propose two heuristics that can easily be applied by members of the general public on existing social media. We demonstrate the effectiveness of those heuristics on a wide variety of networks and against a plethora of link prediction algorithms.

[1]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Bhavani M. Thuraisingham,et al.  Inferring private information using social network data , 2009, WWW '09.

[3]  F. Göbel,et al.  Random walks on graphs , 1974 .

[4]  Sven F. Crone,et al.  Predicting Customer Online Shopping Adoption - an Evaluation of Data Mining and Market Modelling Approaches , 2005, DMIN.

[5]  Mohammad Al Hasan,et al.  A Survey of Link Prediction in Social Networks , 2011, Social Network Data Analytics.

[6]  Lior Rokach,et al.  Links Reconstruction Attack , 2013 .

[7]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[8]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[9]  Paul Erdös,et al.  On random graphs, I , 1959 .

[10]  David Page,et al.  Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals , 2013, ECML/PKDD.

[11]  Krishna P. Gummadi,et al.  You are who you know: inferring user profiles in online social networks , 2010, WSDM '10.

[12]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[13]  Ashish Kumar,et al.  Improving Attribute Inference Attack Using Link Prediction in Online Social Networks , 2016 .

[14]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[15]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[16]  Costas Zafiropoulos,et al.  Connectivity Practices and Activity of Greek Political Blogs , 2012, Future Internet.

[17]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[19]  Linyuan Lü,et al.  Predicting missing links via local information , 2009, 0901.0553.

[20]  Martin Ester,et al.  Co-offending Network Mining , 2011, Counterterrorism and Open Source Intelligence.

[21]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[22]  Lisa Singh,et al.  Can Friends Be Trusted? Exploring Privacy in Online Social Networks , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[23]  Manuel Cebrián,et al.  Limited communication capacity unveils strategies for human interaction , 2013, Scientific Reports.

[24]  Donald E. Knuth,et al.  The Stanford GraphBase - a platform for combinatorial computing , 1993 .

[25]  Timothy Ravasi,et al.  From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks , 2013, Scientific Reports.

[26]  Lise Getoor,et al.  To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles , 2009, WWW '09.

[27]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[28]  Bhavani M. Thuraisingham,et al.  Preventing Private Information Inference Attacks on Social Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[29]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[30]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[31]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[32]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[33]  Valdis E. Krebs,et al.  Mapping Networks of Terrorist Cells , 2001 .

[34]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[35]  Uwe Glässer,et al.  Locating Central Actors in Co-offending Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[36]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[37]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[38]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[39]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[40]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2006 .

[41]  Stanford,et al.  Learning to Discover Social Circles in Ego Networks , 2012 .

[42]  Alex Pentland,et al.  Stealing Reality: When Criminals Become Data Scientists (or Vice Versa) , 2011, IEEE Intelligent Systems.

[43]  Pavel Yu. Chebotarev,et al.  The Matrix-Forest Theorem and Measuring Relations in Small Social Groups , 2006, ArXiv.

[44]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[45]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[46]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .