Semi-supervised Random Forest for Intrusion Detection Network

In order to protect valuable computer systems, network data needs to be analyzed and classified so that possible network intrusions can be detected. Machine learning techniques have been used to classify network data. For supervised machine learning methods, they can achieve high accuracy at classifying network data as normal or malicious, but they require the availability of fully labeled data. Semi-supervised machine learning methods, however, can use a small number of labeled examples and train a large number of examples without label. In this research, we explore the use of semi-supervised Random Forest in classifying network data and intrusion detection. It was used to classify the Third International Knowledge Discovery and Data Mining Tools Competition dataset (KDD 1999) and the result were compared with the results of using the supervised methods of Random Forest. The results were also compared with those using ladder network, an approach which combines unsupervised neural networks, in classifying KDD 1999.