Machine Unlearning for Image Retrieval: A Generative Scrubbing Approach

Data owners have the right to request for deleting their data from a machine learning (ML) model. In response, a naïve way is to retrain the model with the original dataset excluding the data to forget, which is however unrealistic as the required dataset may no longer be available and the retraining process is usually computationally expensive. To cope with this reality, machine unlearning has recently attained much attention, which aims to enable data removal from a trained ML model responding to deletion requests, without retraining the model from scratch or full access to the original training dataset. Existing unlearning methods mainly focus on handling conventional ML methods, while unlearning deep neural networks (DNNs) based models remains underexplored, especially for the ones trained on large-scale datasets. In this paper, we make the first attempt to realize data forgetting on deep models for image retrieval. Image retrieval targets at searching relevant data to the query according to similarity measures. Intuitively, unlearning a deep image retrieval model can be achieved by breaking down its ability of similarity modeling on the data to forget. To this end, we propose a generative scrubbing (GS) method that learns a generator to craft noisy data to manipulate the model weights. A novel framework is designed consisting of the generator and the target retrieval model, where a pair of coupled static and dynamic learning procedures are performed simultaneously. This novel learning strategy effectively enables the generated noisy data to fade away the memory of the model on the data to forget whilst retaining the information of the remaining data. Extensive experiments on three widely-used datasets have successfully verified the effectiveness of the proposed method.

[1]  Mohan S. Kankanhalli,et al.  Fast Yet Effective Machine Unlearning , 2021, IEEE transactions on neural networks and learning systems.

[2]  Liqiang Nie,et al.  Comprehensive Linguistic-Visual Composition Network for Image Retrieval , 2021, SIGIR.

[3]  Zi Huang,et al.  Privacy Protection in Deep Multi-modal Retrieval , 2021, SIGIR.

[4]  Ted Dunning,et al.  HedgeCut: Maintaining Randomised Trees for Low-Latency Machine Unlearning , 2021, SIGMOD Conference.

[5]  Xin-Shun Xu,et al.  Proactive Privacy-preserving Learning for Retrieval , 2021, AAAI.

[6]  Stefano Soatto,et al.  Mixed-Privacy Forgetting in Deep Networks , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Daniel Lowd,et al.  Machine Unlearning for Random Forests , 2020, ICML.

[8]  Yang Yang,et al.  Graph Convolutional Network Hashing , 2020, IEEE Transactions on Cybernetics.

[9]  Stefano Soatto,et al.  Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations , 2020, ECCV.

[10]  Sanjam Garg,et al.  Formalizing Data Deletion in the Context of the Right to Be Forgotten , 2020, IACR Cryptol. ePrint Arch..

[11]  Ben Y. Zhao,et al.  Fawkes: Protecting Privacy against Unauthorized Deep Learning Models , 2020, USENIX Security Symposium.

[12]  M. Zeppelzauer,et al.  Machine unlearning: linear filtration for logit-based classifiers , 2020, Machine Learning.

[13]  Krystian Mikolajczyk,et al.  SOLAR: Second-Order Loss and Attention for Image Retrieval , 2020, ECCV.

[14]  Stefano Soatto,et al.  Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  L. V. D. Maaten,et al.  Certified Data Removal from Machine Learning Models , 2019, ICML.

[16]  Zijian Wang,et al.  Deep Collaborative Discrete Hashing with Semantic-Invariant Structure , 2019, SIGIR.

[17]  James Zou,et al.  Making AI Forget You: Data Deletion in Machine Learning , 2019, NeurIPS.

[18]  Zhi-Hua Zhou,et al.  Learning to Confuse: Generating Training Time Adversarial Data with Auto-Encoder , 2019, NeurIPS.

[19]  Heng Tao Shen,et al.  Unsupervised Deep Hashing with Similarity-Adaptive and Discrete Optimization , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Yi Fang,et al.  Deep Semantic Text Hashing with Weak Supervision , 2018, SIGIR.

[21]  Liqiang Nie,et al.  Fast Scalable Supervised Hashing , 2018, SIGIR.

[22]  Giorgos Tolias,et al.  Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Yi Fang,et al.  Variational Deep Semantic Hashing for Text Documents , 2017, SIGIR.

[24]  Andreas Krause,et al.  Deletion-Robust Submodular Maximization: Data Summarization with "the Right to be Forgotten" , 2017, ICML.

[25]  Wu-Jun Li,et al.  Asymmetric Deep Supervised Hashing , 2017, AAAI.

[26]  Tieniu Tan,et al.  Deep Supervised Discrete Hashing , 2017, NIPS.

[27]  Bohyung Han,et al.  Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[29]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[30]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[31]  Jianmin Wang,et al.  Deep Hashing Network for Efficient Similarity Retrieval , 2016, AAAI.

[32]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[33]  V. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Lei Zhang,et al.  Bit-Scalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person Re-Identification , 2015, IEEE Transactions on Image Processing.

[35]  Tieniu Tan,et al.  Deep semantic ranking based hashing for multi-label image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37]  Hanjiang Lai,et al.  Supervised Hashing for Image Retrieval via Image Representation Learning , 2014, AAAI.

[38]  Alessandro Mantelero,et al.  The EU Proposal for a General Data Protection Regulation and the roots of the 'right to be forgotten' , 2013, Comput. Law Secur. Rev..

[39]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[40]  Chun Chen,et al.  Efficient manifold ranking for image retrieval , 2011, SIGIR.

[41]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[42]  Changhu Wang,et al.  Learning to reduce the semantic gap in web image retrieval and annotation , 2008, SIGIR '08.

[43]  Saurabh Shintre,et al.  Verifying that the influence of a user data point has been removed from a machine learning classifier , 2018 .

[44]  Junyeong Lee,et al.  Let Machines Unlearn - Machine Unlearning and the Right to be Forgotten , 2017, AMCIS.

[45]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[46]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[47]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.