Unknown pattern extraction for statistical network protocol identification

The past decade has seen a lot of research on statistics-based network protocol identification using machine learning techniques. Prior studies have shown promising results in terms of high accuracy and fast classification speed. However, most works have embodied an implicit assumption that all protocols are known in advance and presented in the training data, which is unrealistic since real-world networks constantly witness emerging traffic patterns as well as unknown protocols in the wild. In this paper, we revisit the problem by proposing a learning scheme with unknown pattern extraction for statistical protocol identification. The scheme is designed with a more realistic setting, where the training dataset contains labeled samples from a limited number of protocols, and the goal is to tell these known protocols apart from each other and from potential unknown ones. Preliminary results derived from real-world traffic are presented to show the effectiveness of the scheme.

[1]  Luca Salgarelli,et al.  On the stability of the information carried by traffic flow features at the packet level , 2009, CCRV.

[2]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[3]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[4]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[5]  Grenville J. Armitage,et al.  Clustering to Assist Supervised Machine Learning for Real-Time IP Traffic Classification , 2008, 2008 IEEE International Conference on Communications.

[6]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[7]  Carey L. Williamson,et al.  Categories and Subject Descriptors: C.4 [Computer Systems Organization]Performance of Systems , 2022 .

[8]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[9]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[10]  Yanghee Choi,et al.  Internet traffic classification demystified: on the sources of the discriminative power , 2010, CoNEXT.

[11]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[12]  Sebastian Zander,et al.  Practical machine learning based multimedia traffic classification for distributed QoS management , 2011, 2011 IEEE 36th Conference on Local Computer Networks.

[13]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[14]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[15]  Luca Salgarelli,et al.  Support Vector Machines for TCP traffic classification , 2009, Comput. Networks.

[16]  Jun Zhang,et al.  Internet Traffic Classification Using Constrained Clustering , 2014, IEEE Transactions on Parallel and Distributed Systems.