A set of features to detect web security threats

The increasing growth of malicious websites and systems for distributing malware through websites is making it urgent the adoption of effective techniques for timely detection of web security threats. Current mechanisms may exhibit some limitations, mainly concerning the amount of resources required, and a low true positives rate for zero-day attacks. With this paper, we propose and validate a set of features extracted from the content and the structure of webpages, which could be used as indicators of web security threats. The features are used for building a predictor, based on five machine learning algorithms, which is applied to classify unknown web applications. The experimentation demonstrated that the proposed set of features is able to correctly classify malicious web sites with a high level of precision, corresponding to 0.84 in the best case, and recall corresponding to 0.89 in the best case. The classifiers reveal to be successful also with zero day attacks.

[1]  Niels Provos,et al.  Cybercrime 2.0: When the Cloud Turns Dark , 2009, ACM Queue.

[2]  Dawei Wang,et al.  Malicious Web Pages Detection Based on Abnormal Visibility Recognition , 2009, 2009 International Conference on E-Business and Information System Security.

[3]  Steven D. Gribble,et al.  A Crawler-based Study of Spyware in the Web , 2006, NDSS.

[4]  Collin Jackson,et al.  Robust defenses for cross-site request forgery , 2008, CCS.

[5]  Ian Welch,et al.  HoneyC - The low-interaction client honeypot , 2006 .

[6]  Peishun Liu,et al.  Identification of Malicious Web Pages by Inductive Learning , 2009, WISM.

[7]  Christopher Krügel,et al.  A solution for the automated detection of clickjacking attacks , 2010, ASIACCS '10.

[8]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[9]  Niels Provos,et al.  All Your iFRAMEs Point to Us , 2008, USENIX Security Symposium.

[10]  C. Seifert Know Your Enemy: Malicious Web Servers , 2007 .

[11]  Xin Zhao,et al.  The Nocebo Effect on the Web: An Analysis of Fake Anti-Virus Distribution , 2010, LEET.

[12]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[13]  Andreas Dewald,et al.  Forschungsberichte der Fakultät IV – Elektrotechnik und Informatik C UJO : Efficient Detection and Prevention of Drive-by-Download Attacks , 2010 .

[14]  Helen J. Wang,et al.  BrowserShield: vulnerability-driven filtering of dynamic HTML , 2006, OSDI '06.

[15]  Wang Tao,et al.  A Novel Framework for Learning to Detect Malicious Web Pages , 2010, 2010 International Forum on Information Technology and Applications.

[16]  ProvosNiels,et al.  Cybercrime 2.0: When the Cloud Turns Dark , 2009 .

[17]  Collin Jackson,et al.  Securing frame communication in browsers , 2008, CACM.

[18]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[19]  Martín Abadi,et al.  deSEO: Combating Search-Result Poisoning , 2011, USENIX Security Symposium.

[20]  Ian Welch,et al.  Two-Stage Classification Model to Detect Malicious Web Pages , 2011, 2011 IEEE International Conference on Advanced Information Networking and Applications.

[21]  Yang Wang,et al.  Collecting Internet Malware Based on Client-side Honeypot , 2008, 2008 The 9th International Conference for Young Computer Scientists.

[22]  Xuxian Jiang,et al.  Automated Web Patrol with Strider HoneyMonkeys: Finding Web Sites That Exploit Browser Vulnerabilities , 2006, NDSS.

[23]  Felix C. Freiling,et al.  Monkey-Spider: Detecting Malicious Websites with Low-Interaction Honeyclients , 2008, Sicherheit.

[24]  P. Komisarczuk,et al.  Identification of Malicious Web Pages with Static Heuristics , 2008, 2008 Australasian Telecommunication Networks and Applications Conference.

[25]  Dongmei Zhang,et al.  SAB2: A novel system of malicious webpages detection , 2010, 2010 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT).

[26]  George Lawton Web 2.0 Creates Security Challenges , 2007, Computer.

[27]  Niels Provos,et al.  The Ghost in the Browser: Analysis of Web-based Malware , 2007, HotBots.

[28]  Benjamin G. Zorn,et al.  Zozzle: Low-overhead Mostly Static JavaScript Malware Detection , 2010 .

[29]  Tsuhan Chen,et al.  Malicious web content detection by machine learning , 2010, Expert Syst. Appl..

[30]  Christopher Krügel,et al.  Analyzing and Detecting Malicious Flash Advertisements , 2009, 2009 Annual Computer Security Applications Conference.

[31]  John Grundy,et al.  Supporting automated vulnerability analysis using formalized vulnerability signatures , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[32]  Helen J. Wang,et al.  A Systematic Approach to Uncover Security Flaws in GUI Logic , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).

[33]  Chi-Sung Laih,et al.  Malicious Webpage Detection by Semantics-Aware Reasoning , 2008, 2008 Eighth International Conference on Intelligent Systems Design and Applications.

[34]  Christopher Krügel,et al.  Revolver: An Automated Approach to the Detection of Evasive Web-based Malware , 2013, USENIX Security Symposium.

[35]  Mitsuaki Akiyama,et al.  Searching Structural Neighborhood of Malicious URLs to Improve Blacklisting , 2011, 2011 IEEE/IPSJ International Symposium on Applications and the Internet.

[36]  Giovanni Vigna,et al.  Prophiler: a fast filter for the large-scale detection of malicious web pages , 2011, WWW.

[37]  Chia-Mei Chen,et al.  Anomaly Behavior Analysis for Web Page Inspection , 2009, 2009 First International Conference on Networks & Communications.