Optimal query complexity for private sequential learning against eavesdropping

We study the query complexity of a learner-private sequential learning problem, motivated by the privacy and security concerns due to eavesdropping that arise in practical applications such as pricing and Federated Learning. A learner tries to estimate an unknown scalar value, by sequentially querying an external database and receiving binary responses; meanwhile, a third-party adversary observes the learner's queries but not the responses. The learner's goal is to design a querying strategy with the minimum number of queries (optimal query complexity) so that she can accurately estimate the true value, while the eavesdropping adversary even with the complete knowledge of her querying strategy cannot. We develop new querying strategies and analytical techniques and use them to prove tight upper and lower bounds on the optimal query complexity. The bounds almost match across the entire parameter range, substantially improving upon existing results. We thus obtain a complete picture of the optimal query complexity as a function of the estimation accuracy and the desired levels of privacy. We also extend the results to sequential learning models in higher dimensions, and where the binary responses are noisy. Our analysis leverages a crucial insight into the nature of private learning problem, which suggests that the query trajectory of an optimal learner can be divided into distinct phases that focus on pure learning versus learning and obfuscation, respectively.

[1]  Tassilo Klein,et al.  Differentially Private Federated Learning: A Client Level Perspective , 2017, ArXiv.

[2]  Dan Alistarh,et al.  Byzantine Stochastic Gradient Descent , 2018, NeurIPS.

[3]  Tribhuvanesh Orekondy,et al.  Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks , 2020, ICLR.

[4]  Martin J. Wainwright,et al.  Distance-based and continuum Fano inequalities with applications to statistical estimation , 2013, ArXiv.

[5]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[6]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[7]  Moinuddin K. Qureshi,et al.  Defending Against Model Stealing Attacks With Adaptive Misinformation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  John N. Tsitsiklis,et al.  Private Sequential Learning , 2018, COLT.

[9]  Qing Ling,et al.  RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets , 2018, AAAI.

[10]  Indranil Gupta,et al.  Generalized Byzantine-tolerant SGD , 2018, ArXiv.

[11]  Yi Shi,et al.  How to steal a machine learning classifier with deep learning , 2017, 2017 IEEE International Symposium on Technologies for Homeland Security (HST).

[12]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[13]  Amir Houmansadr,et al.  Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[14]  Maja Vukovic,et al.  Crowdsourcing for Enterprises , 2009, 2009 Congress on Services - I.

[15]  Wolfgang Mulzer Five Proofs of Chernoff's Bound with Applications , 2018, Bull. EATCS.

[16]  Lili Su,et al.  Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent , 2017, Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems.

[17]  Andrzej Pelc,et al.  Searching games with errors - fifty years of coping with liars , 2002, Theor. Comput. Sci..

[18]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[19]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[20]  John N. Tsitsiklis,et al.  Delay-Predictability Trade-offs in Reaching a Secret Goal , 2018, Oper. Res..

[21]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.

[22]  Mine Su Erturk,et al.  Dynamically Protecting Privacy, under Uncertainty , 2019, ArXiv.

[23]  Xiaohan Wei,et al.  Distributed robust statistical learning: Byzantine mirror descent , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[24]  Prateek Mittal,et al.  Analyzing Federated Learning through an Adversarial Lens , 2018, ICML.

[25]  Peter I. Frazier,et al.  Twenty Questions with Noise: Bayes Optimal Policies for Entropy Loss , 2012, Journal of Applied Probability.

[26]  Payman Mohassel,et al.  SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[27]  Tribhuvanesh Orekondy,et al.  Knockoff Nets: Stealing Functionality of Black-Box Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Samuel Marchal,et al.  PRADA: Protecting Against DNN Model Stealing Attacks , 2018, 2019 IEEE European Symposium on Security and Privacy (EuroS&P).

[29]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[30]  Sewoong Oh,et al.  Privacy-Utility Tradeoffs in Routing Cryptocurrency over Payment Channel Networks , 2020, Proc. ACM Meas. Anal. Comput. Syst..

[31]  H. Brendan McMahan,et al.  Learning Differentially Private Recurrent Language Models , 2017, ICLR.

[32]  Eli Upfal,et al.  Computing with Noisy Information , 1994, SIAM J. Comput..

[33]  Pramod Viswanath,et al.  Spy vs. Spy , 2014, SIGMETRICS.

[34]  Peter I. Frazier,et al.  Bisection Search with Noisy Responses , 2013, SIAM J. Control. Optim..

[35]  Anand D. Sarwate,et al.  Stochastic gradient descent with differentially private updates , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[36]  Michael Horstein,et al.  Sequential transmission using noiseless feedback , 1963, IEEE Trans. Inf. Theory.

[37]  Wuqiong Luo,et al.  Infection Spreading and Source Identification: A Hide and Seek Game , 2015, IEEE Transactions on Signal Processing.

[38]  Kannan Ramchandran,et al.  Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.

[39]  Srinivas Devadas,et al.  A Formal Foundation for Secure Remote Execution of Enclaves , 2017, IACR Cryptol. ePrint Arch..

[40]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.

[41]  Peter I. Frazier,et al.  A Bayesian approach to stochastic root finding , 2011, Proceedings of the 2011 Winter Simulation Conference (WSC).

[42]  Jakub Konecný,et al.  Federated Optimization: Distributed Optimization Beyond the Datacenter , 2015, ArXiv.

[43]  Kuang Xu,et al.  Query Complexity of Bayesian Private Learning , 2019, NeurIPS.

[44]  Kannan Ramchandran,et al.  Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning , 2018, ICML.

[45]  Zhi Ding,et al.  Federated Learning via Over-the-Air Computation , 2018, IEEE Transactions on Wireless Communications.

[46]  Pravesh Kothari,et al.  25th Annual Conference on Learning Theory Differentially Private Online Learning , 2022 .

[47]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[48]  Vitaly Shmatikov,et al.  Exploiting Unintended Feature Leakage in Collaborative Learning , 2018, 2019 IEEE Symposium on Security and Privacy (SP).

[49]  Lili Su,et al.  Securing Distributed Gradient Descent in High Dimensional Statistical Learning , 2019, SIGMETRICS.

[50]  Binghui Wang,et al.  Stealing Hyperparameters in Machine Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[51]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.