A heuristic technique to detect phishing websites using TWSVM classifier

Phishing websites are on the rise and are hosted on compromised domains such that legitimate behavior is embedded into the designed phishing site to overcome the detection. The traditional heuristic techniques using HTTPS, search engine, Page Ranking and WHOIS information may fail in detecting phishing sites hosted on the compromised domain. Moreover, list-based techniques fail to detect phishing sites when the target website is not in the whitelisted data. In this paper, we propose a novel heuristic technique using TWSVM to detect malicious registered phishing sites and also sites which are hosted on compromised servers, to overcome the aforementioned limitations. Our technique detects the phishing websites hosted on compromised domains by comparing the log-in page and home page of the visiting website. The hyperlink and URL-based features are used to detect phishing sites which are maliciously registered. We have used different versions of support vector machines (SVMs) for the classification of phishing websites. We found that twin support vector machine classifier (TWSVM) outperformed the other versions with a significant accuracy of 98.05% and recall of 98.33%.

[1]  Jun Ho Huh,et al.  Phishing Detection with Popular Search Engines: Simple and Effective , 2011, FPS.

[2]  Carolyn Penstein Rosé,et al.  CANTINA+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites , 2011, TSEC.

[3]  Tommy W. S. Chow,et al.  Textual and Visual Content-Based Anti-Phishing: A Bayesian Approach , 2011, IEEE Transactions on Neural Networks.

[4]  Stephen Groat,et al.  GoldPhish: Using Images for Content-Based Phishing Analysis , 2010, 2010 Fifth International Conference on Internet Monitoring and Protection.

[5]  Xiaotie Deng,et al.  Detection of phishing webpages based on visual similarity , 2005, WWW '05.

[6]  Glenn Fung,et al.  Multicategory Proximal Support Vector Machine Classifiers , 2005, Machine Learning.

[7]  Indrakshi Ray,et al.  "Kn0w Thy Doma1n Name": Unbiased Phishing Detection Using Domain Name Based Features , 2018, SACMAT.

[8]  Tyler Moore,et al.  Automatic Identification of Replicated Criminal Websites Using Combined Clustering , 2014, 2014 IEEE Security and Privacy Workshops.

[9]  Mingxing He,et al.  An efficient phishing webpage detector , 2011, Expert Syst. Appl..

[10]  Zhu Shaotong,et al.  A Clean-Slate ID/Locator Split Architecture for Future Network , 2016 .

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Calyampudi R. Rao,et al.  Generalized inverse of a matrix and its applications , 1972 .

[13]  Kang-Leng Chiew,et al.  Utilisation of website logo for phishing detection , 2015, Comput. Secur..

[14]  Tyler Moore,et al.  Examining the impact of website take-down on phishing , 2007, eCrime '07.

[15]  Pradeep K. Atrey,et al.  A phish detector using lightweight search features , 2016, Comput. Secur..

[16]  T. L. McCluskey,et al.  Tutorial and critical analysis of phishing websites methods , 2015, Comput. Sci. Rev..

[17]  Ilango Krishnamurthi,et al.  An efficacious method for detecting phishing webpages through target domain identification , 2014, Decis. Support Syst..

[18]  Akira Yamada,et al.  Visual similarity-based phishing detection without victim site information , 2009, 2009 IEEE Symposium on Computational Intelligence in Cyber Security.

[19]  Gary Warner,et al.  Clustering Potential Phishing Websites Using DeepMD5 , 2012, LEET.

[20]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[21]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  T. L. McCluskey,et al.  An assessment of features related to phishing websites using an automated technique , 2012, 2012 International Conference for Internet Technology and Secured Transactions.

[24]  Zhenkai Liang,et al.  Phishing-Alarm: Robust and Efficient Phishing Detection via Page Component Similarity , 2017, IEEE Access.

[25]  Alwyn Roshan Pais,et al.  An Enhanced Blacklist Method to Detect Phishing Websites , 2017, ICISS.

[26]  Xuhua Ding,et al.  Anomaly Based Web Phishing Page Detection , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[27]  Reshma Khemchandani,et al.  Twin Support Vector Machines for Pattern Classification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Samuel Marchal,et al.  Know Your Phish: Novel Techniques for Detecting Phishing Sites and Their Targets , 2015, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[29]  Yuan-Hai Shao,et al.  Improvements on Twin Support Vector Machines , 2011, IEEE Transactions on Neural Networks.

[30]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[31]  Rachel Greenstadt,et al.  PhishZoo: Detecting Phishing Websites by Looking at Them , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[32]  Ankit Kumar Jain,et al.  Two-level authentication approach to protect from phishing attacks in real time , 2018, J. Ambient Intell. Humaniz. Comput..

[33]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[34]  Syed Taqi Ali,et al.  A Computer Vision Technique to Detect Phishing Attacks , 2015, 2015 Fifth International Conference on Communication Systems and Network Technologies.

[35]  Kuan-Ta Chen,et al.  Fighting Phishing with Discriminative Keypoint Features , 2009, IEEE Internet Computing.

[36]  K. S. Banerjee Generalized Inverse of Matrices and Its Applications , 1973 .

[37]  Syed Taqi Ali,et al.  ScienceDirect Eleventh International Multi-Conference on Information Processing-2015 ( IMCIP-2015 ) PhishShield : A Desktop Application to Detect Phishing Webpages through Heuristic Approach , 2015 .

[38]  Ilango Krishnamurthi,et al.  A comprehensive and efficacious architecture for detecting phishing webpages , 2014, Comput. Secur..

[39]  Christopher Krügel,et al.  A layout-similarity-based approach for detecting phishing pages , 2007, 2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops - SecureComm 2007.

[40]  John Heidemann,et al.  AuntieTuna: Personalized Content-based Phishing Detection , 2016 .

[41]  Ali Yazdian Varjani,et al.  New rule-based phishing detection method , 2016, Expert Syst. Appl..

[42]  Jason I. Hong,et al.  A hybrid phish detection approach by identity discovery and keywords retrieval , 2009, WWW '09.

[43]  Alwyn Roshan Pais,et al.  Detecting Phishing Websites using Automation of Human Behavior , 2017, CPSS@AsiaCCS.

[44]  Eric Medvet,et al.  Visual-similarity-based phishing detection , 2008, SecureComm.

[45]  Peng Yang,et al.  Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning , 2019, IEEE Access.

[46]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[47]  Alwyn Roshan Pais,et al.  Detection of phishing websites using an efficient feature-based machine learning framework , 2018, Neural Computing and Applications.

[48]  Xu Chen,et al.  A stacking model using URL and HTML features for phishing webpage detection , 2019, Future Gener. Comput. Syst..

[49]  Julian Jang,et al.  A survey of emerging threats in cybersecurity , 2014, J. Comput. Syst. Sci..

[50]  Ramana Rao Kompella,et al.  PhishNet: Predictive Blacklisting to Detect Phishing Attacks , 2010, 2010 Proceedings IEEE INFOCOM.

[51]  Reshma Khemchandani,et al.  Twin Support Vector Machines - Models, Extensions and Applications , 2016, Studies in Computational Intelligence.