Phishing sites are the major attacks by which most of internet users are being fooled by the phisher. The replicas of the legitimate sites are created and users are directed to that web site by luring some offers to it. There are certain standards which are given by W3C (World Wide Web Consortium), based on these standards we are choosing some features which can easily describe the difference between legit site and phish site. We are proposing a model to determine the phishing sites to safeguard the web users from phisher. The features of URL along with the features of Web Page in HTML tags are considered to determine the attack. Here Clustering of Database is done through K-Means Clustering and Naive Bayes Classifier prediction technique is applied to determine the probability of the web site as Valid Phish or Invalid Phish. K-Means Clustering is applied on initial URL features and Validity is checked if still we are not able to determine the Validity of Web Site then Naive Bayes Classifier is applied onto URL as well as HTML tag features of Site and probability is evaluated based on training model.
[1]
Abraham Silberschatz,et al.
Database System Concepts
,
1980
.
[2]
Marti A. Hearst,et al.
Why phishing works
,
2006,
CHI.
[3]
Christopher Krügel,et al.
A layout-similarity-based approach for detecting phishing pages
,
2007,
2007 Third International Conference on Security and Privacy in Communications Networks and the Workshops - SecureComm 2007.
[4]
Tommy W. S. Chow,et al.
Textual and Visual Content-Based Anti-Phishing: A Bayesian Approach
,
2011,
IEEE Transactions on Neural Networks.