A Graph Representation Learning Algorithm for Low-Order Proximity Feature Extraction to Enhance Unsupervised IDS Preprocessing

Most existing studies on an unsupervised intrusion detection system (IDS) preprocessing ignore the relationship among packets. According to the homophily hypothesis, the local proximity structure in the similarity relational graph has similar embedding after preprocessing. To improve the performance of IDS by building a relationship among packets, we propose a packet2vec learning algorithm that extracts accurate local proximity features based on graph representation by adding penalty to node2vec. In this algorithm, we construct a relational graph G’ by using each packet as a node, calculate the cosine similarity between packets as edges, and then explore the low-order proximity of each packet via the penalty-based random walk in G’. We use the above algorithm as a preprocessing method to enhance the accuracy of unsupervised IDS by retaining the local proximity features of packets maximally. The original features of the packet are combined with the local proximity features as the input of a deep auto-encoder for IDS. Experiments based on ISCX2012 show that the proposal outperforms the state-of-the-art algorithms by 11.6% with respect to the accuracy of unsupervised IDS. It is the first time to introduce graph representation learning for packet-embedded preprocessing in the field of IDS.

[1]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[2]  Satinder Singh,et al.  Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters , 2005, ACSC.

[3]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[4]  G. Gong,et al.  A packet-based precise timing and synchronous DAQ network for the LHAASO project , 2013 .

[5]  Yuefei Zhu,et al.  A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks , 2017, IEEE Access.

[6]  Zhang Yi,et al.  A hierarchical intrusion detection model based on the PCA neural networks , 2007, Neurocomputing.

[7]  Graham Cormode,et al.  Node Classification in Social Networks , 2011, Social Network Data Analytics.

[8]  Maghsoud Abbaspour,et al.  Extracting fuzzy attack patterns using an online fuzzy adaptive alert correlation framework , 2016, Secur. Commun. Networks.

[9]  Yiqiang Sheng,et al.  HAST-IDS: Learning Hierarchical Spatial-Temporal Features Using Deep Neural Networks to Improve Intrusion Detection , 2018, IEEE Access.

[10]  Miguel Correia,et al.  A Systematic Approach for the Application of Restricted Boltzmann Machines in Network Intrusion Detection , 2017, IWANN.

[11]  Jinlin Wang,et al.  Variant Gated Recurrent Units With Encoders to Preprocess Packets for Payload-Aware Intrusion Detection , 2019, IEEE Access.

[12]  L. Javier García-Villalba,et al.  Malware Detection System by Payload Analysis of Network Traffic , 2015, IEEE Latin America Transactions.

[13]  Fakhroddin Noorbehbahani,et al.  An incremental intrusion detection system using a new semi‐supervised stream classification method , 2017, Int. J. Commun. Syst..

[14]  Joan Bruna,et al.  Few-Shot Learning with Graph Neural Networks , 2017, ICLR.

[15]  Gulshan Kumar,et al.  Design of an Evolutionary Approach for Intrusion Detection , 2013, TheScientificWorldJournal.

[16]  Wen Li,et al.  Context Sensitive Host-Based IDS Using Hybrid Automaton: Context Sensitive Host-Based IDS Using Hybrid Automaton , 2009 .

[17]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[18]  Neelima Gupte,et al.  Transmission of packets on a hierarchical network: Statistics and explosive percolation , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[20]  Félix J. García Clemente,et al.  A Self-Adaptive Deep Learning-Based System for Anomaly Detection in 5G Networks , 2018, IEEE Access.

[21]  Najim Dehak,et al.  Age Estimation in Short Speech Utterances Based on LSTM Recurrent Neural Networks , 2018, IEEE Access.

[22]  M. B. Ghaznavi-Ghoushchi,et al.  An Improved Watchdog Technique Based On Power-Aware Hierarchical Design For Ids In Wireless Sensor Networks , 2012, ArXiv.

[23]  B. Surendiran,et al.  Dimensionality reduction using Principal Component Analysis for network intrusion detection , 2016 .

[24]  Rahul D. Shanbhogue,et al.  Survey of Data Mining (DM) and Machine Learning (ML) Methods on Cyber Security , 2017 .

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Yiyi Liao,et al.  Graph Regularized Auto-Encoders for Image Representation , 2017, IEEE Transactions on Image Processing.

[27]  Franck Vidal,et al.  Timing, Storage, and Comparison of Stimulus Duration Engage Discrete Anatomical Components of a Perceptual Timing Network , 2008, Journal of Cognitive Neuroscience.

[28]  Zhang Qi-shan A hierarchical IDS model for MANET based on weighted clustering with self-recommendation , 2007 .