A Novel Semi-supervised Adaboost Technique Based on Improved Tri-training

With the development of the network, network attacks become more frequent and serious, so network security is becoming more and more important. Machine learning has been widely used for network traffic detection, but traditional supervised learning does not perform good in the case of a small amount of labeled data and a large amount of unlabeled data. And this situation exists in a large number in practical applications, so research on semi-supervised algorithms is necessary. The Tri-training algorithm is a semi-supervised learning algorithm with strong generalization ability, which can effectively improve the accuracy of detection. In this paper, we improve the traditional Tri-training algorithm and combine the ensemble learning algorithm to generate the final hypothesis by estimating the confidence of unlabeled data. Experiments show that the improvement of the Tri-training is effective, and a better detection rate is achieved. The proposed system performs well in network traffic detection. Even in the case where the training data set has only a small amount of tagged data, the system can achieve a good detection rate and a low false positive rate. On the NSL-KDD data set, the system performs best in terms of accuracy and algorithm time consumption. On the Kyoto data set, the system achieves a good balance between accuracy and time cost.

[1]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2]  Yan Zhang,et al.  A preprocessing method of AdaBoost for mislabeled data classification , 2017, 2017 29th Chinese Control And Decision Conference (CCDC).

[3]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[4]  Gong Shang-fu,et al.  Intrusion detection system based on classification , 2012, 2012 IEEE International Conference on Intelligent Control, Automatic Detection and High-End Equipment.

[5]  Yan Zhou,et al.  Enhancing Supervised Learning with Unlabeled Data , 2000, ICML.

[6]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[7]  Wei Zhang,et al.  A Novel Semi-Supervised SVM Based on Tri-Training , 2008, IITA 2008.

[8]  Philippe Owezarski,et al.  Sub-Space clustering, Inter-Clustering Results Association & anomaly correlation for unsupervised network anomaly detection , 2011, 2011 7th International Conference on Network and Service Management.

[9]  Dieter Hogrefe,et al.  A Novel Semi-Supervised Adaboost Technique for Network Anomaly Detection , 2016, MSWiM.

[10]  Richard Lippmann,et al.  The 1999 DARPA off-line intrusion detection evaluation , 2000, Comput. Networks.

[11]  Jun Zhang,et al.  A novel semi-supervised approach for network traffic clustering , 2011, 2011 5th International Conference on Network and System Security.

[12]  Carey L. Williamson,et al.  Categories and Subject Descriptors: C.4 [Computer Systems Organization]Performance of Systems , 2022 .

[13]  Philippe Owezarski,et al.  Sub-Space Clustering and Evidence Accumulation for Unsupervised Network Anomaly Detection , 2011, TMA.

[14]  Phurivit Sangkatsanee,et al.  Practical real-time intrusion detection using machine learning approaches , 2011, Comput. Commun..

[15]  Shirina Samreen,et al.  Intelligent network intrusion detection using alternating decision trees , 2016, 2016 International Conference on Circuits, Controls, Communications and Computing (I4C).

[16]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[17]  Xiang Li,et al.  High accurate Internet traffic classification based on co-training semi-supervised clustering , 2010 .

[18]  Guangxia Xu,et al.  An improved social spammer detection based on tri-training , 2016, 2016 IEEE International Conference on Big Data (Big Data).