Detecting Malicious Web Links and Identifying Their Attack Types

Malicious URLs have been widely used to mount various cyber attacks including spamming, phishing and malware. Detection of malicious URLs and identification of threat types are critical to thwart these attacks. Knowing the type of a threat enables estimation of severity of the attack and helps adopt an effective countermeasure. Existing methods typically detect malicious URLs of a single attack type. In this paper, we propose method using machine learning to detect malicious URLs of all the popular attack types and identify the nature of attack a malicious URL attempts to launch. Our method uses a variety of discriminative features including textual properties, link structures, webpage contents, DNS information, and network traffic. Many of these features are novel and highly effective. Our experimental studies with 40,000 benign URLs and 32,000 malicious URLs obtained from real-life Internet sources show that our method delivers a superior performance: the accuracy was over 98% in detecting malicious URLs and over 93% in identifying attack types. We also report our studies on the effectiveness of each group of discriminative features, and discuss their evadability.

[1]  Felix C. Freiling,et al.  Measuring and Detecting Fast-Flux Service Networks , 2008, NDSS.

[2]  Fabrizio Silvestri,et al.  Know your neighbors: web spam detection using the web topology , 2007, SIGIR.

[3]  Nick Feamster,et al.  Understanding the network-level behavior of spammers , 2006, SIGCOMM.

[4]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[5]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[6]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[7]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[8]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[9]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[10]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[11]  Tie-Yan Liu,et al.  Detecting Link Spam Using Temporal Information , 2006, Sixth International Conference on Data Mining (ICDM'06).

[12]  Norman M. Sadeh,et al.  Learning to detect phishing emails , 2007, WWW '07.

[13]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[14]  Luca Becchetti,et al.  A reference collection for web spam , 2006, SIGF.

[15]  Jason I. Hong,et al.  A hybrid phish detection approach by identity discovery and keywords retrieval , 2009, WWW '09.

[16]  Brian Ryner,et al.  Large-Scale Automatic Classification of Phishing Pages , 2010, NDSS.

[17]  Hector Garcia-Molina,et al.  Link Spam Alliances , 2005, VLDB.

[18]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[19]  Ramana Rao Kompella,et al.  PhishNet: Predictive Blacklisting to Detect Phishing Attacks , 2010, 2010 Proceedings IEEE INFOCOM.

[20]  Xuxian Jiang,et al.  Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities , 2006, NDSS.

[21]  Lorrie Faith Cranor,et al.  Cantina: a content-based approach to detecting phishing web sites , 2007, WWW '07.

[22]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[23]  Brian D. Davison,et al.  Cloaking and Redirection: A Preliminary Study , 2005, AIRWeb.

[24]  Minaxi Gupta,et al.  Behind Phishing: An Examination of Phisher Modi Operandi , 2008, LEET.

[25]  Phillip A. Porras,et al.  Highly Predictive Blacklisting , 2008, USENIX Security Symposium.

[26]  Niels Provos,et al.  All Your iFRAMEs Point to Us , 2008, USENIX Security Symposium.

[27]  Niels Provos,et al.  A framework for detection and measurement of phishing attacks , 2007, WORM '07.

[28]  Geoff Hulten,et al.  Spamming botnets: signatures and characteristics , 2008, SIGCOMM '08.

[29]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[30]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[31]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[32]  Damien Deville,et al.  SpyProxy: Execution-based Detection of Malicious Web Content , 2007, USENIX Security Symposium.

[33]  Tyler Moore,et al.  Temporal Correlations between Spam and Phishing Websites , 2009, LEET.

[34]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[35]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[36]  Masaru Kitsuregawa,et al.  Identifying spam link generators for monitoring emerging web spam , 2010, WICOW '10.

[37]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[39]  Tsuhan Chen,et al.  Malicious web content detection by machine learning , 2010, Expert Syst. Appl..