Training traffic classifiers with arbitrary packet sets

Many existing machine learning based traffic classifiers require the first five packets in traffic flows to perform traffic classification. In this work, we investigate the flexibility of using arbitrary sets of packets to train traffic classifiers. Such classifiers could be used as auxiliary classifiers that would function in cases where some packets in flows are unavailable, possibly due to packet losses/retransmissions. Moreover, they could be used to mitigate the issue that payload mutation techniques are used by some malicious applications to evade classification. Experimental results show that with using some packet sets, our classifier produces comparable accuracy to the classifier using the first five packets in flows.

[1]  Yanghee Choi,et al.  NeTraMark: a network traffic classification benchmark , 2011, CCRV.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Grenville J. Armitage,et al.  Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks , 2006, Proceedings. 2006 31st IEEE Conference on Local Computer Networks.

[4]  István Szabó,et al.  On the Validation of Traffic Classification Algorithms , 2008, PAM.

[5]  Antonio Pescapè,et al.  TIE: A Community-Oriented Traffic Classification Platform , 2009, TMA.

[6]  Luca Salgarelli,et al.  Impact of Asymmetric Routing on Statistical Traffic Classification , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[7]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[8]  Niccolo Cascarano,et al.  GT: picking up the truth from the ground for internet traffic , 2009, CCRV.

[9]  Yuan-Cheng Lai,et al.  Evasion Techniques: Sneaking through Your Intrusion Detection/Prevention Systems , 2012, IEEE Communications Surveys & Tutorials.

[10]  Akira Kato,et al.  Traffic Data Repository at the WIDE Project , 2000, USENIX Annual Technical Conference, FREENIX Track.

[11]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[12]  Yanghee Choi,et al.  Internet traffic classification demystified: on the sources of the discriminative power , 2010, CoNEXT.

[13]  Judith Kelner,et al.  A Survey on Internet Traffic Identification , 2009, IEEE Communications Surveys & Tutorials.

[14]  Sebastian Zander,et al.  Practical machine learning based multimedia traffic classification for distributed QoS management , 2011, 2011 IEEE 36th Conference on Local Computer Networks.

[15]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[16]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[17]  Dario Rossi,et al.  Experiences of Internet traffic monitoring with tstat , 2011, IEEE Network.

[18]  Antonio Pescapè,et al.  Early Classification of Network Traffic through Multi-classification , 2011, TMA.

[19]  Donald Ervin Knuth,et al.  The Art of Computer Programming, Volume II: Seminumerical Algorithms , 1970 .