In order to protect valuable computer systems, network data needs to be analyzed and classified so that possible network intrusions can be detected. Machine learning techniques have been used to classify network data. For supervised machine learning methods, they can achieve high accuracy at classifying network data as normal or malicious, but they require the availability of fully labeled data. Semi-supervised machine learning methods, however, can use a small number of labeled examples and train a large number of examples without label. In this research, we explore the use of semi-supervised Random Forest in classifying network data and intrusion detection. It was used to classify the Third International Knowledge Discovery and Data Mining Tools Competition dataset (KDD 1999) and the result were compared with the results of using the supervised methods of Random Forest. The results were also compared with those using ladder network, an approach which combines unsupervised neural networks, in classifying KDD 1999.
[1]
Heba F. Eid,et al.
Hybrid Intelligent Intrusion Detection Scheme
,
2011
.
[2]
Xiaohong Yuan,et al.
Semi-Supervised Deep Neural Network for Network Intrusion Detection
,
2016
.
[3]
Md. Al Mehedi Hasan,et al.
Support Vector Machine and Random Forest Modeling for Intrusion Detection System (IDS)
,
2014
.
[4]
Xiao Liu,et al.
Semi-supervised Node Splitting for Random Forest Construction
,
2013,
2013 IEEE Conference on Computer Vision and Pattern Recognition.
[5]
Leo Breiman,et al.
Random Forests
,
2001,
Machine Learning.