Metrie learning with statistical features for network traffic classification

With the development of Internet techniques, such as the Secure Sockets Layer and Transport Layer Security encryption protocol, the traditional internet traffic classification approaches based on port, IP and packet content is difficult to identify the traffic flows. Therefore, many researches imported Machine Learning algorithm to deal with the problem, and the statistical features are extracted for the machine learning algorithms. However, the features are often constructed of various features in different spaces, such as the port ID, packets number, one-hot encodings and statistical properties. The traditional machine learning algorithms usually use Euclidean metric for the distance computing, which is unable to make the best use of the artificial features with various Internet traffic flow attributes. Considering this, the paper proposed to utilize Metric Learning algorithms to learn the adaptive distance metric for the multiple features. As a result, the proposed algorithm can take better advantage of the artificial features and make full use of the characteristics. Finally, the evaluation is conducted on the encrypted web sites traffic database with the comparison of several state-of-the-art algorithms, and experimental results show that the proposed algorithm has achieved the best performance with 8% higher of accuracy than Decision Tree which is the second best algorithm.