Phishing Websites Detection using Machine Learning

Tremendous resources are spent by organizations guarding against and recovering from cybersecurity attacks by online hackers who gain access to sensitive and valuable user data. Many cyber infiltrations are accomplished through phishing attacks where users are tricked into interacting with web pages that appear to be legitimate. In order to successfully fool a human user, these pages are designed to look like legitimate ones. Since humans are so susceptible to being tricked, automated methods of differentiating between phishing websites and their authentic counterparts are needed as an extra line of defense. The aim of this research is to develop these methods of defense utilizing various approaches to categorize websites. Specifically, we have developed a system that uses machine learning techniques to classify websites based on their URL. We used four classifiers: the decision tree, Naive Bayesian classifier, support vector machine (SVM), and neural network. The classifiers were tested with a data set containing 1,353 real world URLs where each could be categorized as a legitimate site, suspicious site, or phishing site. The results of the experiments show that the classifiers were successful in distinguishing real websites from fake ones over 90% of the time.

[1]  Yong Wang,et al.  You Look Suspicious!!: Leveraging Visible Attributes to Classify Malicious Short URLs on Twitter , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).

[2]  Wa’el Hadi,et al.  A new fast associative classification algorithm for detecting phishing websites , 2016, Appl. Soft Comput..

[3]  Swapan Purkait,et al.  Information Management & Computer Security Phishing counter measures and their effectiveness – literature review , 2016 .

[4]  Arun D. Kulkarni Generating Classification Rules from Training Samples , 2018 .

[5]  Stephen I. Gallant,et al.  Neural network learning and expert systems , 1993 .

[6]  Norman M. Sadeh,et al.  Learning to detect phishing emails , 2007, WWW '07.

[7]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[8]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[9]  J. B. Patil,et al.  Survey on Malicious Web Pages Detection Techniques , 2015 .

[10]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[11]  Justin Tung Ma,et al.  Learning to detect malicious URLs , 2011, TIST.

[12]  Steven C. H. Hoi,et al.  Malicious URL Detection using Machine Learning: A Survey , 2017, ArXiv.

[13]  Fadi A. Thabtah,et al.  Phishing detection based Associative Classification data mining , 2014, Expert Syst. Appl..

[14]  Masahiro Kuyama,et al.  Method for Detecting a Malicious Domain by Using WHOIS and DNS Features , 2016 .

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.