Graph Unlearning

The right to be forgotten states that a data subject has the right to erase their data from an entity storing it. In the context of machine learning (ML), it requires the ML model provider to remove the data subject’s data from the training set used to build the ML model, a process known as machine unlearning. While straightforward and legitimate, retraining the ML model from scratch upon receiving unlearning requests incurs high computational overhead when the training set is large. To address this issue, a number of approximate algorithms have been proposed in the domain of image and text data, among which SISA is the state-of-the-art solution. It randomly partitions the training set into multiple shards and trains a constituent model for each shard. However, directly applying SISA to the graph data can severely damage the graph structural information, and thereby the resulting ML model utility. In this paper, we propose GraphEraser, a novel machine unlearning method tailored to graph data. Its contributions include two novel graph partition algorithms, and a learningbased aggregation method. We conduct extensive experiments on five real-world datasets to illustrate the unlearning efficiency and model utility of GraphEraser. We observe that GraphEraser achieves 2.06× (small dataset) to 35.94× (large dataset) unlearning time improvement compared to retraining from scratch. On the other hand, GraphEraser achieves up to 62.5% higher F1 score than that of random partitioning. In addition, our proposed learning-based aggregation method achieves up to 112% higher F1 score than that of the majority vote aggregation.

[2]  Wen-Chuan Lee,et al.  Trojaning Attack on Neural Networks , 2018, NDSS.

[3]  Jure Leskovec,et al.  PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest , 2020, KDD.

[4]  Li Wang,et al.  Privacy-Preserving Graph Neural Network for Node Classification , 2020, ArXiv.

[5]  M. Veloso,et al.  Classifying and Understanding Financial Data Using Graph Neural Network , 2019 .

[6]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[7]  Michael Backes,et al.  Stealing Links from Graph Neural Networks , 2020, USENIX Security Symposium.

[8]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[9]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[10]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Dawn Song,et al.  The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Mario Fritz,et al.  ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models , 2018, NDSS.

[13]  Donald F. Towsley,et al.  Diffusion-Convolutional Neural Networks , 2015, NIPS.

[14]  Zhipeng Cai,et al.  Adversarial Privacy-Preserving Graph Embedding Against Inference Attack , 2020, IEEE Internet of Things Journal.

[15]  Dafang Zhang,et al.  Dynamic Spatial-Temporal Graph Convolutional Neural Networks for Traffic Forecasting , 2019, AAAI.

[16]  Sebastian Schelter “Amnesia” – Towards Machine Learning Models That Can Forget User Data Very Fast , 2019 .

[17]  Chang Liu,et al.  Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[18]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.

[19]  L. V. D. Maaten,et al.  Certified Data Removal from Machine Learning Models , 2019, ICML.

[20]  Philip S. Yu,et al.  Cross View Link Prediction by Learning Noise-resilient Representation Consensus , 2017, WWW.

[21]  Yu Sun,et al.  Masked Label Prediction: Unified Massage Passing Model for Semi-Supervised Classification , 2020, IJCAI.

[22]  Alan Briggs,et al.  Class Clown: Data Redaction in Machine Unlearning at Enterprise Scale , 2020, ICORES.

[23]  Vitaly Shmatikov,et al.  Exploiting Unintended Feature Leakage in Collaborative Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[24]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[25]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[26]  Sofiane Abbar,et al.  RoadTagger: Robust Road Attribute Inference with Graph Neural Networks , 2020, AAAI.

[27]  Stefano Soatto,et al.  Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Tribhuvanesh Orekondy,et al.  Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[30]  Jure Leskovec,et al.  Unifying Graph Convolutional Neural Networks and Label Propagation , 2020, ArXiv.

[31]  Russ B. Altman,et al.  Graph Convolutional Neural Networks for Predicting Drug-Target Interactions , 2018, bioRxiv.

[32]  Michael Backes,et al.  Dynamic Backdoor Attacks Against Machine Learning Models , 2020, ArXiv.

[33]  Vitaly Shmatikov,et al.  Overlearning Reveals Sensitive Attributes , 2019, ICLR.

[34]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[35]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[36]  Junfeng Yang,et al.  Towards Making Systems Forget with Machine Unlearning , 2015, 2015 IEEE Symposium on Security and Privacy.

[37]  Xiao Zhang,et al.  Multiway spectral community detection in networks , 2015, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  Virat Shejwalkar,et al.  Quantifying Privacy Leakage in Graph Embedding , 2020, MobiQuitous.

[39]  Jingrui He,et al.  DEMO-Net: Degree-specific Graph Neural Networks for Node and Graph Classification , 2019, KDD.

[40]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[41]  Brendan Dolan-Gavitt,et al.  BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain , 2017, ArXiv.

[42]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[43]  Yuxiao Dong,et al.  DeepInf: Social Influence Prediction with Deep Learning , 2018, KDD.

[44]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Dan Boneh,et al.  Ensemble Adversarial Training: Attacks and Defenses , 2017, ICLR.

[46]  Jinyuan Jia,et al.  Backdoor Attacks to Graph Neural Networks , 2020, SACMAT.

[47]  Jianfeng Ma,et al.  Learn to Forget: User-Level Memorization Elimination in Federated Learning , 2020, ArXiv.

[48]  Kamalika Chaudhuri,et al.  Approximate Data Deletion from Machine Learning Models: Algorithms and Evaluations , 2020, ArXiv.

[49]  Taeho Jung,et al.  Federated Dynamic GNN with Secure Aggregation , 2020, ArXiv.

[50]  Matthias Zeppelzauer,et al.  Machine Unlearning: Linear Filtration for Logit-based Classifiers , 2020, ArXiv.

[51]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[52]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[53]  Daniel Gatica-Perez,et al.  Locally Private Graph Neural Networks , 2020, CCS.

[54]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[55]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[56]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  James Zou,et al.  Making AI Forget You: Data Deletion in Machine Learning , 2019, NeurIPS.

[58]  Steve Gregory,et al.  Finding overlapping communities in networks by label propagation , 2009, ArXiv.

[59]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[60]  Jure Leskovec,et al.  MultiSage: Empowering GCN with Contextualized Multi-Embeddings on Web-Scale Multipartite Networks , 2020, KDD.

[61]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[62]  Michael Backes,et al.  When Machine Unlearning Jeopardizes Privacy , 2020, CCS.

[63]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[64]  Yixin Chen,et al.  Weisfeiler-Lehman Neural Machine for Link Prediction , 2017, KDD.

[65]  David M. Sommer,et al.  Towards Probabilistic Verification of Machine Unlearning , 2020, ArXiv.

[66]  David Lie,et al.  Machine Unlearning , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[67]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[68]  Svetha Venkatesh,et al.  Column Networks for Collective Classification , 2016, AAAI.