Supervised contrastive learning over prototype-label embeddings for network intrusion detection

Abstract Contrastive learning makes it possible to establish similarities between samples by comparing their distances in an intermediate representation space (embedding space) and using loss functions designed to attract/repel similar/dissimilar samples. The distance comparison is based exclusively on the sample features. We propose a novel contrastive learning scheme by including the labels in the same embedding space as the features and performing the distance comparison between features and labels in this shared embedding space. Following this idea, the sample features should be close to its ground-truth (positive) label and away from the other labels (negative labels). This scheme allows to implement a supervised classification based on contrastive learning. Each embedded label will assume the role of a class prototype in embedding space, with sample features that share the label gathering around it. The aim is to separate the label prototypes while minimizing the distance between each prototype and its same-class samples. A novel set of loss functions is proposed with this objective. Loss minimization will drive the allocation of sample features and labels in embedding space. Loss functions and their associated training and prediction architectures are analyzed in detail, along with different strategies for label separation. The proposed scheme drastically reduces the number of pair-wise comparisons, thus improving model performance. In order to further reduce the number of pair-wise comparisons, this initial scheme is extended by replacing the set of negative labels by its best single representative: either the negative label nearest to the sample features or the centroid of the cluster of negative labels. This idea creates a new subset of models which are analyzed in detail. The outputs of the proposed models are the distances (in embedding space) between each sample and the label prototypes. These distances can be used to perform classification (minimum distance label), features dimensionality reduction (using the distances and the embeddings instead of the original features) and data visualization (with 2 or 3D embeddings). Although the proposed models are generic, their application and performance evaluation is done here for network intrusion detection, characterized by noisy and unbalanced labels and a challenging classification of the various types of attacks. Empirical results of the model applied to intrusion detection are presented in detail for two well-known intrusion detection datasets, and a thorough set of classification and clustering performance evaluation metrics are included.

[1]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[2]  Fernando Berzal Galiano,et al.  Evaluation Metrics for Unsupervised Learning Algorithms , 2019, ArXiv.

[3]  Zhiwu Lu,et al.  Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning , 2021, UAI.

[4]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[5]  Ching-Yao Chuang,et al.  Debiased Contrastive Learning , 2020, NeurIPS.

[6]  Fillia Makedon,et al.  A Survey on Contrastive Self-supervised Learning , 2020, Technologies.

[7]  Vipin Kumar,et al.  The Challenges of Clustering High Dimensional Data , 2004 .

[8]  João Paulo Papa,et al.  Internet of Things: A survey on machine learning-based intrusion detection approaches , 2019, Comput. Networks.

[9]  James Henderson,et al.  GILE: A Generalized Input-Label Embedding for Text Classification , 2018, TACL.

[10]  Mikhail Khodak,et al.  A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.

[11]  Daniel S. Berman,et al.  A Survey of Deep Learning Methods for Cyber Security , 2019, Inf..

[12]  Weiwei Liu,et al.  Two-Stage Label Embedding via Neural Factorization Machine for Multi-Label Classification , 2019, AAAI.

[13]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[14]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[15]  Ali A. Ghorbani,et al.  A detailed analysis of the KDD CUP 99 data set , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[16]  Leveraging Siamese Networks for One-Shot Intrusion Detection Model , 2020, ArXiv.

[17]  Yang Hua,et al.  Ranked List Loss for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[19]  Jaime Lloret,et al.  Shallow neural network with kernel approximation for prediction problems in highly demanding data networks , 2019, Expert Syst. Appl..

[20]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[21]  Selim F. Yilmaz,et al.  Unsupervised Anomaly Detection via Deep Metric Learning with End-to-End Optimization , 2020, ArXiv.

[22]  Junnan Li,et al.  Prototypical Contrastive Learning of Unsupervised Representations , 2020, ICLR.

[23]  Romain Hérault,et al.  Deep neural networks regularization for structured output prediction , 2015, Neurocomputing.

[24]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[25]  Sung Ju Hwang,et al.  Adversarial Self-Supervised Contrastive Learning , 2020, NeurIPS.

[26]  Jinwoo Shin,et al.  CSI: Novelty Detection via Contrastive Learning on Distributionally Shifted Instances , 2020, NeurIPS.

[27]  Mariya Nazarkevych,et al.  DEVELOPMENT OF MACHINE LEARNING METHOD WITH BIOMETRIC PROTECTION WITH NEW FILTRATION METHODS , 2021 .

[28]  Cordelia Schmid,et al.  Label-Embedding for Image Classification , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  A. A. Zaidan,et al.  Review of intrusion detection systems based on deep learning techniques: coherent taxonomy, challenges, motivations, recommendations, substantial analysis and future directions , 2019, Neural Computing and Applications.

[30]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[31]  Fakhri Karray,et al.  Fisher and Kernel Fisher Discriminant Analysis: Tutorial , 2019, ArXiv.

[32]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[33]  Bo Lang,et al.  Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey , 2019, Applied Sciences.

[34]  Yair Movshovitz-Attias,et al.  No Fuss Distance Metric Learning Using Proxies , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Punam Bedi,et al.  Siam-IDS: Handling class imbalance problem in Intrusion Detection Systems using Siamese Neural Network , 2020 .

[36]  Nour Moustafa,et al.  UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) , 2015, 2015 Military Communications and Information Systems Conference (MilCIS).

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Ali A. Ghorbani,et al.  Application of deep learning to cybersecurity: A survey , 2019, Neurocomputing.

[39]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[40]  Serafeim Moustakidis,et al.  A novel feature extraction methodology using Siamese convolutional neural networks for intrusion detection , 2020, Cybersecurity.

[41]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Yaohui Jin,et al.  Multi-Task Label Embedding for Text Classification , 2017, EMNLP.

[43]  Yonglong Tian,et al.  Contrastive Representation Distillation , 2019, ICLR.

[44]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[46]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[47]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[48]  Guoyin Wang,et al.  Joint Embedding of Words and Labels for Text Classification , 2018, ACL.

[49]  Abdelouahid Derhab,et al.  Deep learning approaches for anomaly-based intrusion detection systems: A survey, taxonomy, and open issues , 2020, Knowl. Based Syst..

[50]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[51]  Jaime Lloret,et al.  Conditional Variational Autoencoder for Prediction and Feature Recovery Applied to Intrusion Detection in IoT , 2017, Sensors.

[52]  Alan F. Smeaton,et al.  Contrastive Representation Learning: A Framework and Review , 2020, IEEE Access.

[53]  Jagath Samarabandu,et al.  Deep learning methods in network intrusion detection: A survey and an objective comparison , 2020, J. Netw. Comput. Appl..

[54]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Gregory Blanc,et al.  Siamese Network Based Feature Learning for Improved Intrusion Detection , 2019, ICONIP.