Using AP-TED to Detect Phishing Attack Variations

It is well known that many phishing attacks are variations of previous phishing attacks. We evaluate here the feasibility of applying Pawlik and Augsten's recent implementation of Tree Edit Distance (AP-TED) calculations as a way to compare DOMs and identify similar phishing attack instances. We also compare this tree method with an existing method that uses the distance between tag vectors to quantity similarity between phishing sites. We observe that no single distance method perfectly detects all types of phishing attack variations. We find that the tree method is more demanding for computing equipment, but it better discriminates the similarity with known attacks. We also introduce a method to reduce the volume of calculations by 99.4% when calculating pairwise edit distance on trees with respect to AP-TED calculations on all data.