Webshell Traffic Detection With Character-Level Features Based on Deep Learning

Webshell is a kind of backdoor programs based on Web services. Network-based detection could monitor the request and response traffic to find abnormal behaviors and detect the existence of Webshell. Some machine learning and deep learning methods have been used in this field, but the current methods need to be further explored in discovering new attacks and performance. In order to detect large-scale unknown Webshell events, we propose a Webshell traffic detection model combining the characteristics of convolutional neural network and long short-term memory network. At the same time, we propose a character-level traffic content feature transformation method. We apply the method in our proposed model and evaluate our approach on a Webshell detection testbed. The experiment result indicates that the model has a high precision rate and recall rate, and the generalization ability can be guaranteed.

[1]  Veronika Laippala,et al.  Syntactic N-gram Collection from a Large-Scale Corpus of Internet Finnish , 2014, Baltic HLT.

[2]  Liang Liu,et al.  Detecting Webshell Based on Random Forest with FastText , 2018, ICCAI 2018.

[3]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[4]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[5]  Xin Rong,et al.  word2vec Parameter Learning Explained , 2014, ArXiv.

[6]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[7]  Su-Ping Chen,et al.  INTRUSION DETECTION USING A HYBRID SUPPORT VECTOR MACHINE BASED ON ENTROPY AND TF-IDF , 2008 .

[8]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[9]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[10]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11]  Xin Sun,et al.  A Matrix Decomposition based Webshell Detection Method , 2017, ICCSP '17.

[12]  Shengli Zhou,et al.  CNN-Webshell: Malicious Web Shell Detection with Convolutional Neural Network , 2017, ICNCC.

[13]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[14]  Meng Zhen Research of Linux WebShell Detection based on SVM Classifier , 2014 .

[15]  S. Rigatti Random Forest. , 2017, Journal of insurance medicine.

[16]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[17]  Yun Zhu,et al.  Support vector machines and Word2vec for text classification with semantic features , 2015, 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC).

[18]  Clement T. Yu,et al.  On the construction of effective vocabularies for information retrieval , 1974, SIGPLAN '73.

[19]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Xiuping Jia,et al.  Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Ma Duohe Research of Webshell Detection Based on Decision Tree , 2012 .

[23]  Satanjeev Banerjee,et al.  The Design, Implementation, and Use of the Ngram Statistics Package , 2003, CICLing.

[24]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25]  Donald E. Eastlake,et al.  US Secure Hash Algorithm 1 (SHA1) , 2001, RFC.