Performance comparison of classifiers on reduced phishing website dataset

The Internet is becoming a necessary and important tool in everyday life. However, Internet users might have poor security for different kinds of web threats, which may lead to financial loss or clients lacking trust in online trading and banking. Phishing is described as a skill of impersonating a trusted website aiming to obtain private and secret information such as a username and password or social security and credit card number. In this paper, phising website dataset taken from UCI was investigated. Its dimension was reduced and the performance comparison of classification algorithms is studied on reduced phishing website dataset. Phishing website dataset was taken from UCI machine learning repository. This dataset consists of 11055 records and 31 features. Feature selection algorithms were applied to reduce the dimension of phishing website dataset and to obtain higher classification performance. Then, the performance of classification algorithms is compared to other data mining classification algorithms. Finally, a comparative classification performance on the reduced dataset by using the common classification algorithms is given.

[1]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[2]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[3]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[4]  Xuhua Ding,et al.  Anomaly Based Web Phishing Page Detection , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[5]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[6]  Fadi A. Thabtah,et al.  Intelligent phishing detection system for e-banking using fuzzy data mining , 2010, Expert Syst. Appl..

[7]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Davis,et al.  Principles of Data Mining , 2001 .

[9]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[10]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[11]  Mahendra Tiwari,et al.  Performance analysis of Data Mining algorithms in Weka , 2012 .

[12]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[13]  Eibe Frank,et al.  Logistic Model Trees , 2003, ECML.

[14]  Bhojane Yogesh,et al.  Intelligent rule-based Phishing Websites Classification , 2016 .

[15]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[16]  Luepol Pipanmaekaporn,et al.  A data mining framework for relevance feature discovery , 2013 .

[17]  T. L. McCluskey,et al.  Predicting phishing websites based on self-structuring neural network , 2013, Neural Computing and Applications.

[18]  Yiming Ye,et al.  E-Commerce Agents: Marketplace Solutions, Security Issues, and Supply and Demand , 2001 .

[19]  Arnon Rungsawang,et al.  Using Domain Top-page Similarity Feature in Machine Learning-Based Web Phishing Detection , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[20]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[21]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.