A Comparison of Machine Learning Attributes for Detecting Malicious Websites

The number of Malicious Websites has increased manifold in the past few years. As on start of year 2018, 1 in every 13 URL was malicious, amounting to 7.8% URLs identified as malicious [1]. These figures have increased by 2.8%, thereby showing an increasing trend of attack vectors through Malicious Websites. These statistics clearly highlight the need to detect Malicious Websites on the Internet. Many research works have suggested Machine Learning techniques to detect Malicious Websites. Research has also been done to compare Machine Learning algorithms for their detection. However, the aspect of attribute selection for detecting Malicious Websites using Machine Learning has not been delved in detail. In Machine Learning techniques, attribute selection outweighs the importance of any other aspect in the process. Thus, there is a need to compare and analyze the various attributes that can help find Malicious Websites faster and better. This paper is focused to address this research gap, so that, fewer and optimal attributes can do a better job.

[1]  Masanori Hirotomo,et al.  Efficient Method for Analyzing Malicious Websites by Using Multi-Environment Analysis System , 2017, 2017 12th Asia Joint Conference on Information Security (AsiaJCIS).

[2]  Tansel Dökeroglu,et al.  Context-sensitive and keyword density-based supervised machine learning techniques for malicious webpage detection , 2018, Soft Computing.

[3]  Konrad Rieck,et al.  Looking Back on Three Years of Flash-based Malware , 2017, EUROSEC.

[4]  Rong Wang,et al.  Detection of malicious web pages based on hybrid analysis , 2017, J. Inf. Secur. Appl..

[5]  Mark Stamp,et al.  Static Analysis of Malicious Java Applets , 2016, IWSPA@CODASPY.

[6]  Iwao Sasase,et al.  Obfuscated malicious javascript detection scheme using the feature based on divided URL , 2017, 2017 23rd Asia-Pacific Conference on Communications (APCC).

[7]  Konrad Rieck,et al.  — Technical Report — Analyzing and Detecting Flash-based Malware using Lightweight MultiPath Exploration , 2016 .

[8]  Antonio Nucci,et al.  Detecting malicious HTTP redirections using trees of user browsing activity , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[9]  Zhenkai Liang,et al.  Phishing-Alarm: Robust and Efficient Phishing Detection via Page Component Similarity , 2017, IEEE Access.

[10]  Niels Provos,et al.  All Your iFRAMEs Point to Us , 2008, USENIX Security Symposium.

[11]  Niels Provos,et al.  Trends and Lessons from Three Years Fighting Malicious Extensions , 2015, USENIX Security Symposium.

[12]  Christopher Krügel,et al.  Detection and analysis of drive-by-download attacks and malicious JavaScript code , 2010, WWW '10.

[13]  Monther Aldwairi,et al.  Malware detection using DNS records and domain name features , 2018, ICFNDS.

[14]  Mahdi Abadi,et al.  Detecting Obfuscated JavaScript Malware Using Sequences of Internal Function Calls , 2014, ACM Southeast Regional Conference.

[15]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[16]  Adrienne Porter Felt,et al.  Measuring HTTPS Adoption on the Web , 2017, USENIX Security Symposium.

[17]  A. K. Singh,et al.  MalCrawler: A Crawler for Seeking and Crawling Malicious Websites , 2017, ICDCIT.

[18]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..