CBR-PDS: a case-based reasoning phishing detection system

Phishing attacks have become the preferred vehicle to gather sensitive information as well as to deliver dangerous malware. So far, there is still no phishing detection system that can perfectly detect and progressively self adapt to differentiate between phishing and legitimate websites. This paper proposes the case-based reasoning Phishing detection system (CBR-PDS) that relies on previous cases to detect phishing attacks. CBR-PDS is highly adaptive and dynamic as it can adapt to detect new phishing attacks using rather a small dataset size in contrast to other machine learning techniques. CBR-PDS aims to improve the detection accuracy and the reliability of the results by identifying a set of discriminative features and discarding irrelevant features. CBR-PDS relies on a two stage hybrid procedure using Information gain and Genetic algorithms. The reduction of the data dimensionality amounts to an improved accuracy rate, yet it necessitates a reduced processing time. The CBR-PDS is tested using different scenarios on a various balanced datasets. The obtained performances clearly show the suitability of our proposed hybrid feature selection procedure as well as the efficiency of the proposed CBR-PDS system. The obtained accuracy rates exceed 95%. We also show that the integration of an Online Phishing Threats component into the CBR-PDS system improves further the accuracy rate. Finally, CRB-PDS performances are compared to those of several known competitive classifiers.

[1]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[2]  Gail E. Kaiser,et al.  Improving the Dependability of Machine Learning Applications , 2008 .

[3]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[4]  Michael M. Richter,et al.  Case-Based Reasoning: A Textbook , 2013 .

[5]  Abdelfettah Belghith,et al.  A Multi-Agent Case-Based Reasoning Architecture for Phishing Detection , 2017, FNC/MobiSPC.

[6]  Radu State,et al.  PhishStorm: Detecting Phishing With Streaming Analytics , 2014, IEEE Transactions on Network and Service Management.

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  Radu State,et al.  PhishScore: Hacking phishers' minds , 2014, 10th International Conference on Network and Service Management (CNSM) and Workshop.

[9]  L. Scrucca Genetic Algorithms for Subset Selection in Model-Based Clustering , 2016 .

[10]  Choon Lin Tan,et al.  Phishing website detection using URL-assisted brand name weighting system , 2014, 2014 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS).

[11]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[12]  Abdelfettah Belghith,et al.  Using Case-Based Reasoning for Phishing Detection , 2017, ANT/SEIT.

[13]  Suku Nair,et al.  A comparison of machine learning techniques for phishing detection , 2007, eCrime '07.

[14]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[15]  Ken Dunham Mobile Malware Attacks and Defense , 2008 .

[16]  Nabil M. Hewahi,et al.  Wrapper Feature Selection based on Genetic Algorithm for Recognizing Objects from Satellite Imagery , 2015, J. Inf. Technol. Res..

[17]  Youssef Iraqi,et al.  Phishing Detection: A Literature Survey , 2013, IEEE Communications Surveys & Tutorials.

[18]  Shafique Ahmad Chaudhry,et al.  Phishing Attacks and Defenses , 2016 .

[19]  Gang Liu,et al.  Automatic Detection of Phishing Target from Phishing Webpage , 2010, 2010 20th International Conference on Pattern Recognition.

[20]  Anazida Zainal,et al.  Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation , 2016, PloS one.

[21]  Tenzin Doleck,et al.  Towards Developing a Tool to Detect Phishing URLs: A Machine Learning Approach , 2015, 2015 IEEE International Conference on Computational Intelligence & Communication Technology.

[22]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[23]  Huajun Huang,et al.  A SVM-based Technique to Detect Phishing URLs , 2012 .

[24]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[25]  Michael M. Richter,et al.  Case-Based Reasoning , 2013, Springer Berlin Heidelberg.

[26]  C. Coeli,et al.  Sex Differences in Diabetes Mellitus Mortality Trends in Brazil, 1980-2012 , 2016, PloS one.

[27]  Ralph Bergmann,et al.  DOI: 10.1017/S000000000000000 Printed in the United Kingdom Representation in case-based reasoning , 2022 .

[28]  A. Kannan,et al.  Performance study of classification techniques for phishing URL detection , 2014, 2014 Sixth International Conference on Advanced Computing (ICoAC).

[29]  Jon Atli Benediktsson,et al.  Feature Selection Based on Hybridization of Genetic Algorithm and Particle Swarm Optimization , 2015, IEEE Geoscience and Remote Sensing Letters.

[30]  Perica Strbac,et al.  Toward optimal feature selection using ranking methods and classification algorithms , 2011 .

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  Zhen Liu,et al.  SmoteAdaNL: a learning method for network traffic classification , 2016, J. Ambient Intell. Humaniz. Comput..

[33]  Nawel Yala,et al.  Towards improving feature extraction and classification for activity recognition on streaming data , 2017, J. Ambient Intell. Humaniz. Comput..

[34]  Paul Albitz,et al.  DNS and BIND , 1994 .

[35]  Andrew H. Sung,et al.  Detection of Phishing Attacks: A Machine Learning Approach , 2008, Soft Computing Applications in Industry.

[36]  Ali Yazdian Varjani,et al.  New rule-based phishing detection method , 2016, Expert Syst. Appl..

[37]  Gang Qu,et al.  Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department , 2017, Expert Syst. Appl..

[38]  Ali Selamat,et al.  Selection of robust feature subsets for phish webpage prediction using maximum relevance and minimum redundancy criterion , 2015 .

[39]  Qingzhong Liu,et al.  Feature Selection for Improved Phishing Detection , 2012, IEA/AIE.

[40]  Iraj Sadegh Amiri,et al.  A Machine-Learning Approach to Phishing Detection and Defense , 2014 .

[41]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[42]  Daisuke Miyamoto,et al.  An Evaluation of Machine Learning-Based Methods for Detection of Phishing Sites , 2008, ICONIP.

[43]  Guillermo Cortes Robles,et al.  Improvement of online adaptation knowledge acquisition and reuse in case-based reasoning: Application to process engineering design , 2015, Eng. Appl. Artif. Intell..