论文信息 - Learning from the Ones that Got Away: Detecting New Forms of Phishing Attacks

Learning from the Ones that Got Away: Detecting New Forms of Phishing Attacks

Phishing attacks continue to pose a major threat for computer system defenders, often forming the first step in a multi-stage attack. There have been great strides made in phishing detection; however, some phishing emails appear to pass through filters by making simple structural and semantic changes to the messages. We tackle this problem through the use of a machine learning classifier operating on a large corpus of phishing and legitimate emails. We design SAFe-PC (Semi-Automated Feature generation for Phish Classification), a system to extract features, elevating some to higher level features, that are meant to defeat common phishing email detection strategies. To evaluate SAFe-PC , we collect a large corpus of phishing emails from the central IT organization at a tier-1 university. The execution of SAFe-PC on the dataset exposes hitherto unknown insights on phishing campaigns directed at university users. SAFe-PC detects more than 70 percent of the emails that had eluded our production deployment of Sophos, a state-of-the-art email filtering tool. It also outperforms SpamAssassin, a commonly used email filtering tool. We also developed an online version of SAFe-PC, that can be incrementally retrained with new samples. Its detection performance improves with time as new samples are collected, while the time to retrain the classifier stays constant.

[1] Julia M. Taylor,et al. Using Syntactic Features for Phishing Detection , 2015, ArXiv.

[2] Rakesh M. Verma,et al. Detecting Phishing Emails the Natural Language Way , 2012, ESORICS.

[3] Rakesh M. Verma,et al. Semantic Feature Selection for Text with Application to Phishing Email Detection , 2013, ICISC.

[4] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[5] Richard Weber,et al. Online phishing classification using adversarial data mining and signaling games , 2010, SKDD.

[6] Vijay K. Gurbani,et al. Phishwish: A Stateless Phishing Filter Using Minimal Rules , 2008, Financial Cryptography.

[7] Mark Dredze,et al. Learning Fast Classifiers for Image Spam , 2007, CEAS.

[8] Lina Zhou,et al. Phishing environments, techniques, and countermeasures: A survey , 2017, Comput. Secur..

[9] Stuart J. Russell,et al. Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[10] Yang Wang,et al. Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[11] Praveen Paritosh,et al. Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[12] Youssef Iraqi,et al. Phishing Detection: A Literature Survey , 2013, IEEE Communications Surveys & Tutorials.

[13] Ewan Klein,et al. Natural Language Processing with Python , 2009 .

[14] Alec Wolman,et al. Itrustpage: a user-assisted anti-phishing tool , 2008, Eurosys '08.

[15] Norman M. Sadeh,et al. Learning to detect phishing emails , 2007, WWW '07.

[16] Brian Ryner,et al. Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[17] Taghi M. Khoshgoftaar,et al. RUSBoost: Improving classification performance when training data is skewed , 2008, 2008 19th International Conference on Pattern Recognition.

[18] John Yearwood,et al. Consensus Clustering and Supervised Classification for Profiling Phishing Emails in Internet Commerce Security , 2010, PKAW.

[19] Gilchan Park,et al. Text-based phishing detection using a simulation model , 2013 .

[20] Dharma P. Agrawal,et al. Fighting against phishing attacks: state of the art and future challenges , 2016, Neural Computing and Applications.

[21] Joelle Pineau,et al. Online Bagging and Boosting for Imbalanced Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[22] Gürsel Serpen,et al. Application of Machine Learning Algorithms to KDD Intrusion Detection Dataset within Misuse Detection Context , 2003, MLMTA.

[23] Douglas H. Fisher,et al. A Case Study of Incremental Concept Induction , 1986, AAAI.