Multi-party Poisoning through Generalized p-Tampering

In a poisoning attack against a learning algorithm, an adversary tampers with a fraction of the training data $T$ with the goal of increasing the classification error of the constructed hypothesis/model over the final test distribution. In the distributed setting, $T$ might be gathered gradually from $m$ data providers $P_1,\dots,P_m$ who generate and submit their shares of $T$ in an online way. In this work, we initiate a formal study of $(k,p)$-poisoning attacks in which an adversary controls $k\in[n]$ of the parties, and even for each corrupted party $P_i$, the adversary submits some poisoned data $T'_i$ on behalf of $P_i$ that is still "$(1-p)$-close" to the correct data $T_i$ (e.g., $1-p$ fraction of $T'_i$ is still honestly generated). For $k=m$, this model becomes the traditional notion of poisoning, and for $p=1$ it coincides with the standard notion of corruption in multi-party computation. We prove that if there is an initial constant error for the generated hypothesis $h$, there is always a $(k,p)$-poisoning attacker who can decrease the confidence of $h$ (to have a small error), or alternatively increase the error of $h$, by $\Omega(p \cdot k/m)$. Our attacks can be implemented in polynomial time given samples from the correct data, and they use no wrong labels if the original distributions are not noisy. At a technical level, we prove a general lemma about biasing bounded functions $f(x_1,\dots,x_n)\in[0,1]$ through an attack model in which each block $x_i$ might be controlled by an adversary with marginal probability $p$ in an online way. When the probabilities are independent, this coincides with the model of $p$-tampering attacks, thus we call our model generalized $p$-tampering. We prove the power of such attacks by incorporating ideas from the context of coin-flipping attacks into the $p$-tampering model and generalize the results in both of these areas.

[1]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[2]  Miklos Santha,et al.  Generating Quasi-random Sequences from Semi-random Sources , 1986, J. Comput. Syst. Sci..

[3]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[4]  Oded Goldreich,et al.  Unbiased Bits from Sources of Weak Randomness and Probabilistic Communication Complexity , 1988, SIAM J. Comput..

[5]  Michael E. Saks,et al.  Some extremal problems arising from discrete control processes , 1989, Comb..

[6]  Nathan Linial,et al.  Collective Coin Flipping , 1989, Adv. Comput. Res..

[7]  Moni Naor,et al.  Adaptively secure multi-party computation , 1996, STOC '96.

[8]  Eyal Kushilevitz,et al.  PAC learning with nasty noise , 1999, Theoretical Computer Science.

[9]  Yevgeniy Dodis,et al.  New Imperfect Random Source with Applications to Coin-Flipping , 2001, ICALP.

[10]  Avi Wigderson,et al.  A note on ex-tracting randomness from Santha-Vazirani sources , 2004 .

[11]  Amit Sahai,et al.  On the (im)possibility of cryptography with imperfect randomness , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[12]  Yevgeniy Dodis,et al.  On the Impossibility of Extracting Classical Randomness Using a Quantum Computer , 2006, ICALP.

[13]  Amit Sahai,et al.  On the Computational Complexity of Coin Flipping , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[14]  Eran Omri,et al.  Coin Flipping with Constant Bias Implies One-Way Functions , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[15]  Kai-Min Chung,et al.  On the Impossibility of Cryptography with Tamperable Randomness , 2014, Algorithmica.

[16]  Itay Berman,et al.  Coin flipping of any constant bias implies one-way functions , 2014, STOC.

[17]  Claudia Eckert,et al.  Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[18]  Yevgeniy Dodis,et al.  Privacy with Imperfect Randomness , 2015, CRYPTO.

[19]  Yael Tauman Kalai,et al.  Adaptively Secure Coin-Flipping, Revisited , 2015, ICALP.

[20]  Amin Gohari,et al.  Deterministic Randomness Extraction from Generalized and Distributed Santha-Vazirani Sources , 2015, ICALP.

[21]  Prateek Saxena,et al.  Auror: defending against poisoning attacks in collaborative deep learning systems , 2016, ACSAC.

[22]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[23]  Michael P. Wellman,et al.  Towards the Science of Security and Privacy in Machine Learning , 2016, ArXiv.

[24]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[25]  Saeed Mahloujifar,et al.  Blockwise p-Tampering Attacks on Cryptographic Primitives, Extractors, and Learners , 2017, TCC.

[26]  Rachid Guerraoui,et al.  Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent , 2017, NIPS.

[27]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[28]  Sarvar Patel,et al.  Practical Secure Aggregation for Privacy-Preserving Machine Learning , 2017, IACR Cryptol. ePrint Arch..

[29]  Niv Buchbinder,et al.  Fair Coin Flipping: Tighter Analysis and the Many-Party Case , 2017, SODA.

[30]  J. Liao,et al.  Sharpening Jensen's Inequality , 2017, The American Statistician.

[31]  Michael P. Wellman,et al.  SoK: Security and Privacy in Machine Learning , 2018, 2018 IEEE European Symposium on Security and Privacy (EuroS&P).

[32]  Ivan Beschastnikh,et al.  Mitigating Sybils in Federated Learning Poisoning , 2018, ArXiv.

[33]  Salman Beigi,et al.  Optimal Deterministic Extractors for Generalized Santha-Vazirani Sources , 2018, APPROX-RANDOM.

[34]  Vitaly Shmatikov,et al.  How To Backdoor Federated Learning , 2018, AISTATS.