A General Retraining Framework for Scalable Adversarial Classification

Traditional classification algorithms assume that training and test data come from similar distributions. This assumption is violated in adversarial settings, where malicious actors modify instances to evade detection. A number of custom methods have been developed for both adversarial evasion attacks and robust learning. We propose the first systematic and general-purpose retraining framework which can: a) boost robustness of an \emph{arbitrary} learning algorithm, in the face of b) a broader class of adversarial models than any prior methods. We show that, under natural conditions, the retraining framework minimizes an upper bound on optimal adversarial risk, and show how to extend this result to account for approximations of evasion attacks. Extensive experimental evaluation demonstrates that our retraining methods are nearly indistinguishable from state-of-the-art algorithms for optimizing adversarial risk, but are more general and far more scalable. The experiments also confirm that without retraining, our adversarial framework dramatically reduces the effectiveness of learning. In contrast, retraining significantly boosts robustness to evasion attacks without significantly compromising overall accuracy.

[1]  Tobias Scheffer,et al.  Static prediction games for adversarial learning problems , 2012, J. Mach. Learn. Res..

[2]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[3]  Wenke Lee,et al.  Polymorphic Blending Attacks , 2006, USENIX Security Symposium.

[4]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[5]  Yevgeniy Vorobeychik,et al.  Behavioral Experiments in Email Filter Evasion , 2016, AAAI.

[6]  MICANS Security Evaluation of Pattern Classifiers under Attack , 2014 .

[7]  L. Ghaoui,et al.  Robust Classification with Interval Data , 2003 .

[8]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[9]  P. Pavani Security Evaluation of Pattern Classifiers under Attack , 2016 .

[10]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[11]  Tobias Scheffer,et al.  Stackelberg games for adversarial prediction problems , 2011, KDD.

[12]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[13]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[14]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[15]  Aleksander Kolcz,et al.  Feature Weighting for Improved Classifier Robustness , 2009, CEAS 2009.

[16]  J. Doug Tygar,et al.  Evasion and Hardening of Tree Ensemble Classifiers , 2015, ICML.

[17]  Yevgeniy Vorobeychik,et al.  Feature Cross-Substitution in Adversarial Classification , 2014, NIPS.

[18]  Christopher Krügel,et al.  Exploiting Redundancy in Natural Language to Penetrate Bayesian Spam Filters , 2007, WOOT.

[19]  Yevgeniy Vorobeychik,et al.  Scalable Optimization of Randomized Operational Decisions in Adversarial Classification Settings , 2015, AISTATS.

[20]  Ling Huang,et al.  Query Strategies for Evading Convex-Inducing Classifiers , 2010, J. Mach. Learn. Res..

[21]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[22]  Alexander J. Smola,et al.  Convex Learning with Invariances , 2007, NIPS.

[23]  Philip K. Chan,et al.  Learning nonstationary models of normal network traffic for detecting novel attacks , 2002, KDD.

[24]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .