论文信息 - Model Explanations with Differential Privacy

Model Explanations with Differential Privacy

Black-box machine learning models are used in critical decision-making domains, giving rise to several calls for more algorithmic transparency. The drawback is that model explanations can leak information about the training data and the explanation data used to generate them, thus undermining data privacy. To address this issue, we propose differentially private algorithms to construct feature-based model explanations. We design an adaptive differentially private gradient descent algorithm, that finds the minimal privacy budget required to produce accurate explanations. It reduces the overall privacy loss on explanation data, by adaptively reusing past differentially private explanations. It also amplifies the privacy guarantees with respect to the training data. We evaluate the implications of differentially private models and our privacy mechanisms on the quality of model explanations.

Neel Patel | Yair Zick | Reza Shokri

[1] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[2] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[3] Vitaly Shmatikov,et al. Differential Privacy Has Disparate Impact on Model Accuracy , 2019, NeurIPS.

[4] Ariel D. Procaccia,et al. Influence in Classification via Cooperative Game Theory , 2015, IJCAI.

[5] Frank McSherry,et al. Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[6] Jianmo Ni,et al. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[7] Motoaki Kawanabe,et al. How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[8] Cynthia Dwork,et al. Differential Privacy , 2006, ICALP.

[9] Amir Houmansadr,et al. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[10] Vitaly Shmatikov,et al. Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[11] Avanti Shrikumar,et al. Learning Important Features Through Propagating Activation Differences , 2017, ICML.