Privacy Attacks on Network Embeddings

Data ownership and data protection are increasingly important topics with ethical and legal implications, e.g., with the right to erasure established in the European General Data Protection Regulation (GDPR). In this light, we investigate network embeddings, i.e., the representation of network nodes as low-dimensional vectors. We consider a typical social network scenario with nodes representing users and edges relationships between them. We assume that a network embedding of the nodes has been trained. After that, a user demands the removal of his data, requiring the full deletion of the corresponding network information, in particular the corresponding node and incident edges. In that setting, we analyze whether after the removal of the node from the network and the deletion of the vector representation of the respective node in the embedding significant information about the link structure of the removed node is still encoded in the embedding vectors of the remaining nodes. This would require a (potentially computationally expensive) retraining of the embedding. For that purpose, we deploy an attack that leverages information from the remaining network and embedding to recover information about the neighbors of the removed node. The attack is based on (i) measuring distance changes in network embeddings and (ii) a machine learning classifier that is trained on networks that are constructed by removing additional nodes. Our experiments demonstrate that substantial information about the edges of a removed node/user can be retrieved across many different datasets. This implies that to fully protect the privacy of users, node deletion requires complete retraining - or at least a significant modification - of original network embeddings. Our results suggest that deleting the corresponding vector representation from network embeddings alone is not sufficient from a privacy perspective.

[1]  S. Boorman,et al.  Social Structure from Multiple Networks. I. Blockmodels of Roles and Positions , 1976, American Journal of Sociology.

[2]  Mukund Balasubramanian,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[3]  Graham Cormode,et al.  Node Classification in Social Networks , 2011, Social Network Data Analytics.

[4]  Giovanni Felici,et al.  Hacking smart machines with smarter ones: How to extract meaningful data from machine learning classifiers , 2013, Int. J. Secur. Networks.

[5]  Michael P. Wellman,et al.  SoK: Security and Privacy in Machine Learning , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[6]  Yingshu Li,et al.  Collective Data-Sanitization for Preventing Sensitive Information Inference Attacks in Social Networks , 2018, IEEE Transactions on Dependable and Secure Computing.

[7]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[8]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[9]  Michael P. Wellman,et al.  Towards the Science of Security and Privacy in Machine Learning , 2016, ArXiv.

[10]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[13]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[14]  Michael Veale,et al.  Algorithms that remember: model inversion attacks and data protection law , 2018, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[15]  S. Boorman,et al.  Social structure from multiple networks: I , 1976 .

[16]  Bhavani M. Thuraisingham,et al.  Inferring private information using social network data , 2009, WWW '09.

[17]  Peng Wang,et al.  Link prediction in social networks: the state-of-the-art , 2014, Science China Information Sciences.

[18]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[19]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[20]  Michalis Vazirgiannis,et al.  Matching Node Embeddings for Graph Similarity , 2017, AAAI.

[21]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[22]  David Heckerman,et al.  Probabilistic Entity-Relationship Models, PRMs, and Plate Models , 2004 .

[23]  Kevin Chen-Chuan Chang,et al.  A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[24]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[25]  S. Boorman,et al.  Social Structure from Multiple Networks. II. Role Structures , 1976, American Journal of Sociology.

[26]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[27]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[28]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[29]  Mario Fritz,et al.  ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[30]  Ling Liu,et al.  Towards Demystifying Membership Inference Attacks , 2018, ArXiv.

[31]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[32]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[33]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[34]  Kevin Chen-Chuan Chang,et al.  Semantic Proximity Search on Heterogeneous Graph by Proximity Embedding , 2017, AAAI.

[35]  M. Newman,et al.  Vertex similarity in networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[37]  Lise Getoor,et al.  Preserving the Privacy of Sensitive Relationships in Graph Data , 2007, PinKDD.

[38]  Zhenyu Liu,et al.  Inferring Privacy Information from Social Networks , 2006, ISI.

[39]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[40]  Ben Taskar,et al.  Probabilistic Entity-Relationship Models, PRMs, and Plate Models , 2007 .