Automatic Identification of Replicated Criminal Websites Using Combined Clustering

To be successful, cyber criminals must figure out how to scale their scams. They duplicate content on new websites, often staying one step ahead of defenders that shut down past schemes. For some scams, such as phishing and counterfeit-goods shops, the duplicated content remains nearly identical. In others, such as advanced-fee fraud and online Ponzi schemes, the criminal must alter content so that it appears different in order to evade detection by victims and law enforcement. Nevertheless, similarities often remain, in terms of the website structure or content, since making truly unique copies does not scale well. In this paper, we present a novel combined clustering method that links together replicated scam websites, even when the criminal has taken steps to hide connections. We evaluate its performance against two collected datasets of scam websites: fake-escrow services and high-yield investment programs (HYIPs). We find that our method more accurately groups similar websites together than does existing general-purpose consensus clustering methods.

[1]  Cormac Herley,et al.  Evaluating a trial deployment of password re-use for phishing prevention , 2007, eCrime '07.

[2]  Kurt Hornik,et al.  A Combination Scheme for Fuzzy Clustering , 2002, AFSS.

[3]  Tyler Moore,et al.  Examining the impact of website take-down on phishing , 2007, eCrime '07.

[4]  Kurt Hornik,et al.  A CLUE for CLUster Ensembles , 2005 .

[5]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[6]  Alice J. O'Toole,et al.  DISTATIS: The Analysis of Multiple Distance Matrices , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[7]  T. Moore,et al.  Pick your poison: pricing and inventories at unlicensed online pharmacies , 2013, EC '13.

[8]  Tyler Moore,et al.  The Postmodern Ponzi Scheme: Empirical Analysis of High-Yield Investment Programs , 2012, Financial Cryptography.

[9]  Niels Provos,et al.  All Your iFRAMEs Point to Us , 2008, USENIX Security Symposium.

[10]  Hervé Abdi,et al.  STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling , 2012 .

[11]  Maurizio Vichi,et al.  Fuzzy partition models for fitting a set of partitions , 2001 .

[12]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[13]  Chris Kanich,et al.  Spamalytics: an empirical analysis of spam marketing conversion , 2008, CCS.

[14]  Sabine Krolak-Schwerdt,et al.  Three-Way Multidimensional Scaling: Formal Properties and Relationships Between Scaling Methods , 2005, Data Analysis and Decision Support.

[15]  Jun-Lin Lin Detection of cloaked web spam by using tag-based methods , 2009, Expert Syst. Appl..

[16]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[17]  Stefan Savage,et al.  Cloak and dagger: dynamics of web search cloaking , 2011, CCS '11.

[18]  Stefan Savage,et al.  Spamscatter: Characterizing Internet Scam Hosting Infrastructure , 2007, USENIX Security Symposium.

[19]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[20]  Thomas Lavergne,et al.  Tracking Web spam with HTML style similarities , 2008, TWEB.

[21]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[22]  Gary Warner,et al.  Automating phishing website identification through deep MD5 matching , 2008, 2008 eCrime Researchers Summit.

[23]  Brian Haig,et al.  Encyclopedia of Measurement and Statistics Spurious Correlation , 2014 .

[24]  Anthony V. Fiacco,et al.  Nonlinear programming;: Sequential unconstrained minimization techniques , 1968 .

[25]  He Liu,et al.  Click Trajectories: End-to-End Analysis of the Spam Value Chain , 2011, 2011 IEEE Symposium on Security and Privacy.

[26]  Paul A. Watters,et al.  Automatically determining phishing campaigns using the USCAP methodology , 2010, 2010 eCrime Researchers Summit.

[27]  Tyler Moore,et al.  The Impact of Incentives on Notice and Take-down , 2008, WEIS.