The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection

Hyperparameter optimization is a ubiquitous challenge in machine learning, and the performance of a trained model depends crucially upon their effective selection. While a rich set of tools exist for this purpose, there are currently no practical hyperparameter selection methods under the constraint of differential privacy (DP). We study honest hyperparameter selection for differentially private machine learning, in which the process of hyperparameter tuning is accounted for in the overall privacy budget. To this end, we i) show that standard composition tools outperform more advanced techniques in many settings, ii) empirically and theoretically demonstrate an intrinsic connection between the learning rate and clipping norm hyperparameters, iii) show that adaptive optimizers like DPAdam enjoy a significant advantage in the process of honest hyperparameter tuning, and iv) draw upon novel limiting behaviour of Adam in the DP setting to design a new and more efficient optimizer.

[1]  Úlfar Erlingsson,et al.  The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks , 2018, USENIX Security Symposium.

[2]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[3]  Thomas Steinke,et al.  Hyperparameter Tuning with Renyi Differential Privacy , 2021, ArXiv.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[6]  Amir Houmansadr,et al.  Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[7]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[8]  Vitaly Shmatikov,et al.  Machine Learning Models that Remember Too Much , 2017, CCS.

[9]  Antti Honkela,et al.  Learning Rate Adaptation for Differentially Private Learning , 2020, AISTATS.

[10]  Pramod Viswanath,et al.  The Composition Theorem for Differential Privacy , 2013, IEEE Transactions on Information Theory.

[11]  Huishuai Zhang,et al.  Large Scale Private Learning via Low-rank Reparametrization , 2021, ICML.

[12]  Shuang Song,et al.  Making the Shoe Fit: Architectures, Initializations, and Tuning for Learning with Privacy , 2019 .

[13]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[14]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[15]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[16]  Pan Zhou,et al.  Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning , 2020, NeurIPS.

[17]  T. Basaruddin,et al.  Differentially private optimization algorithms for deep neural networks , 2017, 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS).

[18]  Frank McSherry,et al.  Probabilistic Inference and Differential Privacy , 2010, NIPS.

[19]  Dan Boneh,et al.  Differentially Private Learning Needs Better Features (or Much More Data) , 2020, ICLR.

[20]  Roman Garnett,et al.  Differentially Private Bayesian Optimization , 2015, ICML.

[21]  Gilles Barthe,et al.  Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences , 2018, NeurIPS.

[22]  Calton Pu,et al.  Differentially Private Model Publishing for Deep Learning , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[23]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[24]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[25]  Kaiyong Zhao,et al.  AutoML: A Survey of the State-of-the-Art , 2019, Knowl. Based Syst..

[26]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[27]  H. Brendan McMahan,et al.  Training Production Language Models without Memorizing User Data , 2020, ArXiv.

[28]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[29]  Somesh Jha,et al.  Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures , 2015, CCS.

[30]  Vitaly Shmatikov,et al.  Membership Inference Attacks Against Machine Learning Models , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[31]  Tom Diethe,et al.  Automatic Discovery of Privacy–Utility Pareto Fronts , 2019, Proc. Priv. Enhancing Technol..

[32]  H. Brendan McMahan,et al.  A General Approach to Adding Differential Privacy to Iterative Training Procedures , 2018, ArXiv.

[33]  Kamalika Chaudhuri,et al.  A Stability-based Validation Procedure for Differentially Private Machine Learning , 2013, NIPS.

[34]  Kunal Talwar,et al.  Private selection from private candidates , 2018, STOC.