Malicious URL detection with feature extraction based on machine learning

Many web applications suffer from various web attacks due to the lack of awareness concerning security. Therefore, it is necessary to improve the reliability of web applications by accurately detecting malicious URLs. In previous studies, keyword matching has always been used to detect malicious URLs, but this method is not adaptive. In this paper, statistical analyses based on gradient learning and feature extraction using a sigmoidal threshold level are combined to propose a new detection approach based on machine learning techniques. Moreover, the naive Bayes, decision tree and SVM classifiers are used to validate the accuracy and efficiency of this method. Finally, the experimental results demonstrate that this method has a good detection performance, with an accuracy rate above 98.7%. In practical use, this system has been deployed online and is being used in large-scale detection, analysing approximately 2 TB of data every day.

[1]  Zhang Lu,et al.  Detection and classification of calcifications in digital mammograms by multi-scale and multi-position , 2012, Int. J. Comput. Sci. Eng..

[2]  Ruixuan Li,et al.  Identifying malicious Android apps using permissions and system events , 2015, Int. J. Embed. Syst..

[3]  Niels Provos,et al.  A framework for detection and measurement of phishing attacks , 2007, WORM '07.

[4]  Yuval Elovici,et al.  Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey , 2009, Inf. Secur. Tech. Rep..

[5]  Phillip A. Porras,et al.  Highly Predictive Blacklisting , 2008, USENIX Security Symposium.

[6]  Lawrence K. Saul,et al.  Beyond blacklists: learning to detect malicious web sites from suspicious URLs , 2009, KDD.

[7]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[8]  Thinn Thu Naing,et al.  Naïve Bayes Classifier Based Traffic Prediction System on Cloud Infrastructure , 2015, 2015 6th International Conference on Intelligent Systems, Modelling and Simulation.

[9]  Jugal K. Kalita,et al.  Network Anomaly Detection: Methods, Systems and Tools , 2014, IEEE Communications Surveys & Tutorials.

[10]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[11]  Stefan Katzenbeisser,et al.  Detecting Malicious Code by Model Checking , 2005, DIMVA.

[12]  Thamar Solorio,et al.  Lexical feature based phishing URL detection using online learning , 2010, AISec '10.

[13]  Ramana Rao Kompella,et al.  PhishNet: Predictive Blacklisting to Detect Phishing Attacks , 2010, 2010 Proceedings IEEE INFOCOM.

[14]  Vern Paxson,et al.  Outside the Closed World: On Using Machine Learning for Network Intrusion Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[15]  Steven C. H. Hoi,et al.  Cost-sensitive online active learning with application to malicious URL detection , 2013, KDD.

[16]  Igor Kotenko,et al.  Abnormal traffic detection in networks of the Internet of things based on fuzzy logical inference , 2015, 2015 XVIII International Conference on Soft Computing and Measurements (SCM).

[17]  Timothy K. Shih,et al.  Context-aware privacy and security agents for distance education , 2005, Int. J. High Perform. Comput. Netw..

[18]  Sandra Johnson,et al.  Effective feature set construction for SVM-based hot method prediction and optimisation , 2011, Int. J. Comput. Sci. Eng..

[19]  James Won-Ki Hong,et al.  A flow-based method for abnormal network traffic detection , 2004, 2004 IEEE/IFIP Network Operations and Management Symposium (IEEE Cat. No.04CH37507).

[20]  Amutha Prabakar Muniyandi,et al.  Network Anomaly Detection by Cascading K-Means Clustering and C4.5 Decision Tree algorithm , 2012 .

[21]  Sun Jian,et al.  A multi-layer bloom filter for duplicated URL detection , 2010, 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE).

[22]  Baojiang Cui,et al.  Multi-layer Anomaly Detection for Internet Traffic Based on Data Mining , 2015, 2015 9th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.

[23]  Dogru Nejdet Traffic Accident Detection By Using Machine Learning Methods , 2012 .

[24]  Tsuhan Chen,et al.  Malicious web content detection by machine learning , 2010, Expert Syst. Appl..