Automatic Detection of Phishing Target from Phishing Webpage

An approach to identification of the phishing target of a given (suspicious) webpage is proposed by clustering the webpage set consisting of its all associated webpages and the given webpage itself. We first find its associated webpages, and then explore their relationships to the given webpage as their features for clustering. Such relationships include link relationship, ranking relationship, text similarity, and webpage layout similarity relationship. A DBSCAN clustering method is employed to find if there is a cluster around the given webpage. If such cluster exists, we claim the given webpage is a phishing webpage and then find its phishing target (i.e., the legitimate webpage it is attacking) from this cluster. Otherwise, we identify it as a legitimate webpage. Our test dataset consists of 8745 phishing pages (targeting at 76 well-known websites) selected from Phish Tank and preliminary experiments show that the approach can successfully identify 91.44% of their phishing targets. Another dataset of 1000 legitimate webpages is collected to test our method’s false alarm rate, which is 3.40%.