Optimizing IP Flow Classification Using Feature Selection

The identification of network applications is essential to numerous network activities. Unfortunately, traditional port-based classification and packet payload-based analysis exhibit a number of shortfalls. An alternative is to use Machine Learning (ML) techniques and identify network applications based on per-flow features. Since a lot of flow features can be used for flow classification and there are many irrelevant and redundant features among them, feature selection plays a vital role in performance optimizing. In this paper, we propose a wrapper-based feature selection method for IP flow classification using modified random-mutation hill-climbing (RMHC) and C4.5 algorithm (MRMHC-C4.5). The experiments show our approach can greatly improve computational performance without negative impact on classification accuracy.