Network Traffic Text Classification Based on Multi-instance Learning and Principal Component Analysis

Network traffic text classification plays an important role in network security. Traditional classification methods based on machine learning, such as supervised learning algorithms and semi-supervised algorithms, are insufficient: classification mode is too simple, unable to adapt to diverse classification requirements; text feature selection method is simple, text classification lacks diversity, and classification accuracy is low. And the classification speed is slow, not suitable for environments with high traffic and real-time. Multi-instance learning classification can describe the characteristics of the sample more accurately and comprehensively, and can improve the classification effect. In this paper, we combined the multi-instance learning classification with principal component analysis (PCA) to select text features of data sets, and removed the redundant and uncorrelated features in the original data, obtained a better classification accuracy.