Closed-form Machine Unlearning for Matrix Factorization

Matrix factorization (MF) is a fundamental model in data mining and machine learning, which finds wide applications in diverse application areas, including recommendation systems with user-item rating matrices, phenotype extraction from electronic health records, and spatial-temporal data analysis for check-in records. The "right to be forgotten" has become an indispensable privacy consideration due to the widely enforced data protection regulations, which allow personal users having contributed their data for model training to revoke their data through a data deletion request. Consequently, it gives rise to the emerging task of machine unlearning for the MF model, which removes the influence of the matrix rows/columns from the trained MF factors upon receiving the deletion requests from the data owners of these rows/columns. The central goal is to effectively remove the influence of the rows/columns to be forgotten, while avoiding the computationally prohibitive baseline approach of retraining from scratch. Existing machine unlearning methods are either designed for single-variable models and not compatible with MF that has two factors as coupled model variables, or require alternative updates that are not efficient enough. In this paper, we propose a closed-form machine unlearning method. In particular, we explicitly capture the implicit dependency between the two factors, which yields the total Hessian-based Newton step as the closed-form unlearning update. In addition, we further introduce a series of efficiency-enhancement strategies by exploiting the structural properties of the total Hessian. Extensive experiments on five real-world datasets from three application areas as well as synthetic datasets validate the efficiency, effectiveness, and utility of the proposed method.

[1]  Haowen Lin,et al.  Clustering Human Mobility with Multiple Spaces , 2022, 2022 IEEE International Conference on Big Data (Big Data).

[2]  Hui Li,et al.  ARCANE: An Efficient Architecture for Exact Machine Unlearning , 2022, IJCAI.

[3]  M. Zhang,et al.  Recommendation Unlearning , 2022, WWW.

[4]  L. Zhang,et al.  A differentially private nonnegative matrix factorization for recommender system , 2022, Inf. Sci..

[5]  Joyce C. Ho,et al.  Communication Efficient Tensor Factorization for Decentralized Healthcare Networks , 2021, 2021 IEEE International Conference on Data Mining (ICDM).

[6]  Nicolas Papernot,et al.  Unrolling SGD: Understanding Factors Influencing Machine Unlearning , 2021, 2022 IEEE 7th European Symposium on Security and Privacy (EuroS&P).

[7]  Quanming Yao,et al.  A Scalable, Adaptive and Sound Nonconvex Regularizer for Low-rank Matrix Learning , 2021, WWW.

[8]  Joyce C. Ho,et al.  Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics , 2021, WWW.

[9]  Yang Zhang,et al.  Graph Unlearning , 2021, CCS.

[10]  Ananda Theertha Suresh,et al.  Remember What You Want to Forget: Algorithms for Machine Unlearning , 2021, NeurIPS.

[11]  Stefano Soatto,et al.  Mixed-Privacy Forgetting in Deep Networks , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Joyce C. Ho,et al.  Robust Irregular Tensor Factorization and Completion for Temporal Health Data Analysis , 2020, CIKM.

[13]  Daniel Lowd,et al.  Machine Unlearning for Random Forests , 2020, ICML.

[14]  Yinjun Wu,et al.  DeltaGrad: Rapid retraining of machine learning models , 2020, ICML.

[15]  Yaoliang Yu,et al.  Newton-type Methods for Minimax Optimization , 2020, ArXiv.

[16]  Stefano Soatto,et al.  Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations , 2020, ECCV.

[17]  Christopher A. Choquette-Choo,et al.  Machine Unlearning , 2019, 2021 IEEE Symposium on Security and Privacy (SP).

[18]  L. V. D. Maaten,et al.  Certified Data Removal from Machine Learning Models , 2019, ICML.

[19]  Jing Ma,et al.  Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis , 2019, CIKM.

[20]  James Zou,et al.  Making AI Forget You: Data Deletion in Machine Learning , 2019, NeurIPS.

[21]  Agustí Verde Parera,et al.  General Data Protection Regulation , 2018, Data Protection Law in the EU: Roles, Responsibilities and Liability.

[22]  Yao-Yi Chiang,et al.  SRC: automatic extraction of phrase-level map labels from historical maps , 2018, SIGSPACIAL.

[23]  Michael Beigl,et al.  Matrix factorization on semantic trajectories for predicting future semantic locations , 2017, 2017 IEEE 13th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob).

[24]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[25]  Jimeng Sun,et al.  Phenotyping using Structured Collective Matrix Factorization of Multi--source EHR Data , 2016, 1609.04466.

[26]  Daqing Zhang,et al.  PrivCheck: privacy-preserving check-in data publishing for personalized location based services , 2016, UbiComp.

[27]  Daqing Zhang,et al.  Participatory Cultural Mapping Based on Collective Behavior Data in Location-Based Social Networks , 2016, ACM Trans. Intell. Syst. Technol..

[28]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[29]  Andrew W. Fitzgibbon,et al.  Secrets of Matrix Factorization: Approximations, Numerics, Manifold Optimization and Random Restarts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Daqing Zhang,et al.  NationTelescope: Monitoring and visualizing large-scale collective behavior in LBSNs , 2015, J. Netw. Comput. Appl..

[31]  Gaogang Xie,et al.  Sequential and adaptive sampling for matrix completion in network monitoring systems , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[32]  Cheng-Hao Tsai,et al.  Incremental and decremental training for linear classification , 2014, KDD.

[33]  Alessio Del Bue,et al.  Bilinear Modeling via Augmented Lagrange Multipliers (BALM) , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Walter Willinger,et al.  Spatio-temporal compressive sensing and internet traffic matrices , 2009, SIGCOMM '09.

[35]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[36]  René Vidal,et al.  Multiframe Motion Segmentation with Missing Data Using PowerFactorization and GPCA , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[37]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[38]  yuan. luo,et al.  ScanMap: Supervised Confounding Aware Non-negative Matrix Factorization for Polygenic Risk Modeling , 2020, MLHC.

[39]  Aeron Buchanan Morgan,et al.  Investigation into Matrix Factorization when Elements are Unknown , 2004 .