Knowing your enemies: leveraging data analysis to expose phishing patterns against a major US financial institution

Phishing attacks against financial institutions constitutes a major concern and forces them to invest thousands of dollars annually in prevention, detection and takedown of these kinds of attacks. This operation is so massive and time critical that there is usually no time to perform analysis to look for patterns and correlations between attacks. In this work we summarize our findings after applying data analysis and clustering analysis to the record of attacks registered for a major financial institution in the US. We use HTML structure and content analysis, as well as domain registration records and DNS RRSets information of the sites, in order to look for patterns and correlations between phishing attacks. It is shown that by understanding and clustering the different types of phishing sites, we are able to identify different strategies used by criminal organizations. Furthermore, the findings of this study provide us valuable insight into who is targeting the institution and their modus operandi, which gives us a solid foundation for the construction of more and better tools for detection and takedown, and eventually for forensic analysts who will be able to correlate cases and perform focused searches that speed up their investigations.

[1]  Harry Wechsler,et al.  Phishing detection and impersonated entity discovery using Conditional Random Field and Latent Dirichlet Allocation , 2013, Comput. Secur..

[2]  Anthony Skjellum,et al.  A series of methods for the systematic reduction of phishing , 2011 .

[3]  Ahmad-Reza Sadeghi,et al.  A Forensic Framework for Tracing Phishers , 2007, FIDIS.

[4]  Michele Ceccarelli,et al.  Assessing Clustering Reliability and Features Informativeness by Random Permutations , 2007, KES.

[5]  Gilles Louppe,et al.  Understanding variable importances in forests of randomized trees , 2013, NIPS.

[6]  J. Pieprzyk,et al.  Winning the Phishing War: A Strategy for Australia , 2010, 2010 Second Cybercrime and Trustworthy Computing Workshop.

[7]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Stilianos Vidalis,et al.  Who are you Today? Profiling the ID Theft Fraudster , 2012 .

[9]  Brian Ryner,et al.  Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  A.-R. Sadeghi,et al.  Phishing Phishers - Observing and Tracing Organized Cybercrime , 2007, Second International Conference on Internet Monitoring and Protection (ICIMP 2007).

[12]  Radu State,et al.  PhishScore: Hacking phishers' minds , 2014, 10th International Conference on Network and Service Management (CNSM) and Workshop.

[13]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[14]  D. Watson,et al.  The Honeynet Project: Data Collection Tools, Infrastructure, Archives and Analysis , 2008, 2008 WOMBAT Workshop on Information Security Threats Data Collection and Sharing.

[15]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[16]  Richard Weber,et al.  Latent semantic analysis and keyword extraction for phishing classification , 2010, 2010 IEEE International Conference on Intelligence and Security Informatics.

[17]  Huan Liu,et al.  Feature Selection for Clustering , 2000, Encyclopedia of Database Systems.

[18]  S. Roopak,et al.  A Novel Phishing Page Detection Mechanism Using HTML Source Code Comparison and Cosine Similarity , 2014, 2014 Fourth International Conference on Advances in Computing and Communications.

[19]  Shujun Li,et al.  A novel anti-phishing framework based on honeypots , 2009, 2009 eCrime Researchers Summit.

[20]  T. L. McCluskey,et al.  Predicting phishing websites based on self-structuring neural network , 2013, Neural Computing and Applications.

[21]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[22]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[23]  Radu State,et al.  PhishStorm: Detecting Phishing With Streaming Analytics , 2014, IEEE Transactions on Network and Service Management.

[24]  Brad Wardman,et al.  The Deadliest Catch: Reeling In Big Phish With a Deep MD5 Net , 2010, J. Digit. Forensics Secur. Law.

[25]  Masoumeh Zareapoor,et al.  Text Mining for Phishing E-mail Detection , 2015 .

[26]  Felix C. Freiling,et al.  Learning More about the Underground Economy: A Case-Study of Keyloggers and Dropzones , 2009, ESORICS.

[27]  Rakesh M. Verma,et al.  On the Character of Phishing URLs: Accurate and Robust Statistical Learning Classifiers , 2015, CODASPY.

[28]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[29]  Phillip A. Porras,et al.  Highly Predictive Blacklisting , 2008, USENIX Security Symposium.

[30]  Samuel Marchal,et al.  Know Your Phish: Novel Techniques for Detecting Phishing Sites and Their Targets , 2015, 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS).

[31]  Suku Nair,et al.  A comparison of machine learning techniques for phishing detection , 2007, eCrime '07.

[32]  Marti A. Hearst,et al.  Why phishing works , 2006, CHI.