JSPRE: A Large-Scale Detection of Malicious JavaScript Code Based on Pre-filter

Malicious web pages that use drive-by-download attacks or social engineering technique have become a popular means for compromising hosts on the Internet. To search for malicious web pages, researchers have developed a number of systems that analyze web pages for the presence of malicious code. Most of these systems use dynamic analysis. That is, the tools are quite precise, the analysis process is costly. Therefore, performing this analysis on a large-scale of web pages can be prohibitive. In this paper, we present JSPRE, an approach to search the web more efficiently for pages that are likely malicious. JSPRE proposes a malicious page collection algorithm based on guided crawling, which starts from an initial URLs of know malicious web pages. In the meanwhile, JSPRE uses static analysis techniques to quickly examine a web page for malicious content. We have implemented our approach, and we evaluated it on a large-scale dataset. The results show that JSPRE is able to identify malicious web pages more efficiently when compared to crawler-based approaches.

[1]  Maninder Singh,et al.  Efficient hybrid technique for detecting zero-day polymorphic worms , 2014, 2014 IEEE International Advance Computing Conference (IACC).

[2]  Deepak Garg,et al.  Information Flow Control in WebKit's JavaScript Bytecode , 2014, POST.

[3]  Benjamin Livshits,et al.  ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection , 2011, USENIX Security Symposium.

[4]  Charlie Miller,et al.  Engineering Heap Overflow Exploits with JavaScript , 2008, WOOT.

[5]  Katsuyoshi Iida,et al.  Lightweight Approach to Detect Drive-by Download Attacks Based on File Type Transition , 2014, CoNEXT Student Workshop '14.

[6]  Evangelos P. Markatos,et al.  Network-level polymorphic shellcode detection using emulation , 2006, Journal in Computer Virology.

[7]  Andreas Dewald,et al.  Forschungsberichte der Fakultät IV – Elektrotechnik und Informatik C UJO : Efficient Detection and Prevention of Drive-by-Download Attacks , 2010 .

[8]  Atish Das Sarma,et al.  Fast Distributed PageRank Computation , 2013, ICDCN.

[9]  Lance Spitzner,et al.  The Honeynet Project: Trapping the Hackers , 2003, IEEE Secur. Priv..

[10]  Jose Nazario,et al.  PhoneyC: A Virtual Client Honeypot , 2009, LEET.

[11]  Ian Welch,et al.  HoneyC - The low-interaction client honeypot , 2006 .

[12]  Junho Choi,et al.  Efficient Malicious Code Detection Using N-Gram Analysis and SVM , 2011, 2011 14th International Conference on Network-Based Information Systems.

[13]  Hiroshi Ishii,et al.  Memory-efficient signature matching for ClamAV on FPGA , 2014, 2014 IEEE Fifth International Conference on Communications and Electronics (ICCE).

[14]  Johnny Long,et al.  Google Hacking for Penetration Testers , 2004 .

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Torsten Suel,et al.  Design and implementation of a high-performance distributed Web crawler , 2002, Proceedings 18th International Conference on Data Engineering.

[17]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[18]  YoungHan Choi,et al.  Automatic Detection for JavaScript Obfuscation Attacks in Web Pages through String Pattern Analysis , 2009, FGIT.

[19]  Ryan Flores,et al.  How Blackhat SEO Became Big , 2010 .

[20]  Steven D. Gribble,et al.  A Crawler-based Study of Spyware in the Web , 2006, NDSS.

[21]  Magnus Madsen,et al.  Modeling the HTML DOM and browser API in static analysis of JavaScript web applications , 2011, ESEC/FSE '11.

[22]  Giovanni Vigna,et al.  Detecting malicious JavaScript code in Mozilla , 2005, 10th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS'05).