A targeted web crawling for building malicious javascript collection

Malicious javascript frequently serves as a starting point of web-based attacks, in particular cross-site scripting. Thus detecting malicious javascript before execution can protect users from attacks such as malware infection, drive-by downloads, and even from participating in denial-of-service attacks as part of botnet sometimes. A large collection of malicious javascript would help with detector development, but by the time crawler arrives at blacklisted domains attackers and malicious scripts are often long gone. We have used classifiers to direct a web crawler better towards more likely locations of malicious scripts, and show how this targeted web crawler performs compared to crawler seed with blacklisted-domains.

[1]  Steven D. Gribble,et al.  A Crawler-based Study of Spyware in the Web , 2006, NDSS.

[2]  Giovanni Vigna,et al.  Detecting malicious JavaScript code in Mozilla , 2005, 10th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS'05).

[3]  Christopher Krügel,et al.  Defending Browsers against Drive-by Downloads: Mitigating Heap-Spraying Code Injection Attacks , 2009, DIMVA.

[4]  Ajay Chander,et al.  JavaScript instrumentation for browser security , 2007, POPL '07.

[5]  Helen J. Wang,et al.  BrowserShield: vulnerability-driven filtering of dynamic HTML , 2006, OSDI '06.

[6]  Martin Johns,et al.  On JavaScript Malware and related threats , 2008, Journal in Computer Virology.