Misleading Learners: Co-opting Your Spam Filter

Using statistical machine learning for making security decisions intro- duces new vulnerabilities in large scale systems. We show how an adversary can exploit statistical machine learning, as used in the SpamBayes spam filter, to ren- der it useless—even if the adversary's access is limited to only 1% of the spam training messages. We demonstrate three new attacks that successfully make the filter unusable, prevent victims from receiving specific email messages, and cause spam emails to arrive in the victim's inbox.

[1]  Christopher Krügel,et al.  Exploiting Redundancy in Natural Language to Penetrate Bayesian Spam Filters , 2007, WOOT.

[2]  Blaine Nelson,et al.  Exploiting Machine Learning to Subvert Your Spam Filter , 2008, LEET.

[3]  Gordon V. Cormack,et al.  Spam Corpus Creation for TREC , 2005, CEAS.

[4]  James Newsome,et al.  Paragraph: Thwarting Signature Learning by Training Maliciously , 2006, RAID.

[5]  Ming Li,et al.  Learning in the Presence of Malicious Errors , 1993, SIAM J. Comput..

[6]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[7]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[8]  Salvatore J. Stolfo,et al.  Detecting Viral Propagations Using Email Behavior Profiles , 2003 .

[9]  Christopher Meek,et al.  Good Word Attacks on Statistical Spam Filters , 2005, CEAS.

[10]  V. Rao Vemuri,et al.  Using Text Categorization Techniques for Intrusion Detection , 2002, USENIX Security Symposium.

[11]  Shyhtsun Felix Wu,et al.  On Attacking Statistical Spam Filters , 2004, CEAS.

[12]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[13]  Aloysius K. Mok,et al.  Allergy Attack Against Automatic Signature Generation , 2006, RAID.

[14]  Yiming Yang,et al.  Introducing the Enron Corpus , 2004, CEAS.

[15]  Andrew H. Sung,et al.  Intrusion detection using neural networks and support vector machines , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[16]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[17]  Aloysius K. Mok,et al.  Advanced Allergy Attacks: Does a Corpus Really Help? , 2007, RAID.

[18]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[19]  B. Karp,et al.  Autograph: Toward Automated, Distributed Worm Signature Detection , 2004, USENIX Security Symposium.

[20]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[21]  Gary Robinson,et al.  A statistical approach to the spam problem , 2003 .

[22]  Tony A. Meyer,et al.  SpamBayes: Effective open-source, Bayesian based, email classification system , 2004, CEAS.