Performance evaluation of a machine learning algorithm for early application identification

The early identification of applications through the observation and fast analysis of the associated packet flows is a critical building block of intrusion detection and policy enforcement systems. The simple techniques currently used in practice, such as looking at the transport port numbers or at the application payload, are increasingly less effective for new applications using random port numbers and/or encryption. Therefore, there is increasing interest in machine learning techniques capable of identifying applications by examining features of the associated traffic process such as packet lengths and inter-arrival times. However, these techniques require that the classification algorithm is trained with examples of the traffic generated by the applications to be identified, possibly on the link where the the classifier will operate. In this paper we provide two new contributions. First, we apply the C4.5 decision tree algorithm to the problem of early application identification (i.e. looking at the first packets of the flow) and show that it has better performance than the algorithms proposed in the literature. Moreover, we evaluate the performance of the classifier when training is performed on a link different from the link where the classifier operates. This is an important issue, as a pre-trained portable classifier would greatly facilitate the deployment and management of the classification infrastructure.

[1]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[2]  A. Mena,et al.  An empirical study of real audio traffic , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[3]  Renata Teixeira,et al.  Early application identification , 2006, CoNEXT '06.

[4]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[5]  Matthew Roughan,et al.  Class-of-service mapping for QoS: a statistical signature-based approach to IP traffic classification , 2004, IMC '04.

[6]  Y. Ebihara Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[9]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[10]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[11]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[12]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[13]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.