The security of machine learning

Machine learning’s ability to rapidly evolve to changing and complex situations has helped it become a fundamental tool for computer security. That adaptability is also a vulnerability: attackers can exploit machine learning systems. We present a taxonomy identifying and analyzing attacks against machine learning systems. We show how these classes influence the costs for the attacker and defender, and we give a formal structure defining their interaction. We use our framework to survey and analyze the literature of attacks against machine learning systems. We also illustrate our taxonomy by showing how it can guide attacks against SpamBayes, a popular statistical spam filter. Finally, we discuss how our taxonomy suggests new lines of defenses.

[1]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[2]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[3]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[4]  C. Jennison,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[5]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[6]  Yaser S. Abu-Mostafa,et al.  Complexity in Information Theory , 1988, Springer New York.

[7]  Andrew Chi-Chih Yao,et al.  Computational information theory , 1988 .

[8]  Kymie M. C. Tan,et al.  "Why 6?" Defining the operational limits of stide, an anomaly-based intrusion detector , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[9]  Kymie M. C. Tan,et al.  Undermining an Anomaly-Based Intrusion Detection System Using Common Exploits , 2002, RAID.

[10]  John McHugh,et al.  Hiding Intrusions: From the Abnormal to the Normal and Beyond , 2002, Information Hiding.

[11]  David A. Wagner,et al.  Mimicry attacks on host-based intrusion detection systems , 2002, CCS '02.

[12]  Gary Robinson,et al.  A statistical approach to the spam problem , 2003 .

[13]  Philip K. Chan,et al.  An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection , 2003, RAID.

[14]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[15]  B. Karp,et al.  Autograph: Toward Automated, Distributed Worm Signature Detection , 2004, USENIX Security Symposium.

[16]  Shyhtsun Felix Wu,et al.  On Attacking Statistical Spam Filters , 2004, CEAS.

[17]  B. Ripley,et al.  Robust Statistics , 2018, Wiley Series in Probability and Statistics.

[18]  Andreas Christmann,et al.  On Robustness Properties of Convex Risk Minimization Methods for Pattern Recognition , 2004, J. Mach. Learn. Res..

[19]  Tony A. Meyer,et al.  SpamBayes: Effective open-source, Bayesian based, email classification system , 2004, CEAS.

[20]  David A. Wagner,et al.  Resilient aggregation in sensor networks , 2004, SASN '04.

[21]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[22]  Christopher Meek,et al.  Good Word Attacks on Statistical Spam Filters , 2005, CEAS.

[23]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[24]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[25]  Gordon V. Cormack,et al.  Spam Corpus Creation for TREC , 2005, CEAS.

[26]  Salvatore J. Stolfo,et al.  Anagram: A Content Anomaly Detector Resistant to Mimicry Attack , 2006, RAID.

[27]  Stefan Savage,et al.  Inferring Internet denial-of-service activity , 2001, TOCS.

[28]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[29]  Wenke Lee,et al.  Evading network anomaly detection systems: formal reasoning and practical techniques , 2006, CCS '06.

[30]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[31]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[32]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[33]  Carla E. Brodley,et al.  Spam Filtering Using Inexact String Matching in Explicit Feature Space with On-Line Linear Classifiers , 2006, TREC.

[34]  James Newsome,et al.  Paragraph: Thwarting Signature Learning by Training Maliciously , 2006, RAID.

[35]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[36]  Aloysius K. Mok,et al.  Allergy Attack Against Automatic Signature Generation , 2006, RAID.

[37]  Zhe Wang,et al.  Filtering Image Spam with Near-Duplicate Detection , 2007, CEAS.

[38]  Mark Dredze,et al.  Learning Fast Classifiers for Image Spam , 2007, CEAS.

[39]  Aloysius K. Mok,et al.  Advanced Allergy Attacks: Does a Corpus Really Help? , 2007, RAID.

[40]  Blaine Nelson,et al.  Exploiting Machine Learning to Subvert Your Spam Filter , 2008, LEET.

[41]  A. N. Shiryayev,et al.  Selected Works of A.N. Kolmogorov: Volume III Information Theory and the Theory of Algorithms , 2010 .

[42]  A. Kolmogorov,et al.  Information theory and the theory of algorithms , 2010 .