Model Explanations with Differential Privacy

Black-box machine learning models are used in critical decision-making domains, giving rise to several calls for more algorithmic transparency. The drawback is that model explanations can leak information about the training data and the explanation data used to generate them, thus undermining data privacy. To address this issue, we propose differentially private algorithms to construct feature-based model explanations. We design an adaptive differentially private gradient descent algorithm, that finds the minimal privacy budget required to produce accurate explanations. It reduces the overall privacy loss on explanation data, by adaptively reusing past differentially private explanations. It also amplifies the privacy guarantees with respect to the training data. We evaluate the implications of differentially private models and our privacy mechanisms on the quality of model explanations.

[1]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[2]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[3]  Vitaly Shmatikov,et al.  Differential Privacy Has Disparate Impact on Model Accuracy , 2019, NeurIPS.

[4]  Ariel D. Procaccia,et al.  Influence in Classification via Cooperative Game Theory , 2015, IJCAI.

[5]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[6]  Jianmo Ni,et al.  Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[7]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[8]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[9]  Amir Houmansadr,et al.  Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[10]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[11]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[12]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[13]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[14]  Yair Zick,et al.  A Characterization of Monotone Influence Measures for Data Classification , 2017, ArXiv.

[15]  Frederik Harder,et al.  Interpretable and Differentially Private Predictions , 2019, AAAI.

[16]  Anca D. Dragan,et al.  Model Reconstruction from Model Explanations , 2018, FAT.

[17]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[18]  Pramod Viswanath,et al.  The Composition Theorem for Differential Privacy , 2013, IEEE Transactions on Information Theory.

[19]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[20]  S. Barsov,et al.  Estimates of the proximity of Gaussian measures , 1987 .

[21]  Suresh Venkatasubramanian,et al.  Auditing black-box models for indirect influence , 2016, Knowledge and Information Systems.

[22]  Gilles Barthe,et al.  Privacy Amplification by Mixing and Diffusion Mechanisms , 2019, NeurIPS.

[23]  Ohad Shamir,et al.  Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes , 2012, ICML.

[24]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[25]  Matt Fredrikson,et al.  Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs , 2017, CCS.

[26]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[27]  Cengiz Öztireli,et al.  Towards better understanding of gradient-based attribution methods for Deep Neural Networks , 2017, ICLR.

[28]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[29]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[30]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[31]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[32]  Reza Shokri,et al.  Privacy Risks of Explaining Machine Learning Models , 2019, ArXiv.