Interest in traffic classification has dramatically grown in the past few years in both industry and academia. As more and more applications are encrypting the payloads and not to use well-known ports, traditional traffic classification methods such as transport-layer protocol ports based ones can not accurately and efficiently deal with these applications. In this paper we investigate the problem of classifing traffic flows into different application categories. And a new bag-of-words (BoW) model based traffic classification method is proposed, which has been widely used in document classification and computer vision. In the new traffic classification method the application categories of interests represents the bags, centroids represent the words of the BoW model, respectively. By constructing representation vectors for the application categories and calculating the cosine similarity between each category representation vector and newly built-up vector converted from flows to be tested, we can find the application category that a tested flow belongs to. Using real traffic traces we demonstrate that the proposed approach is able to achieve 93% overall accuracy and the classification is not affected by the packet arrival sequences (e.g. out of order arrivals). The overall accuracy of the proposed approach is observed to be higher than the widely used C4.5 algorithm by 10% in our experiment when the out of order arrival happens.
[1]
Andrew W. Moore,et al.
Internet traffic classification using bayesian analysis techniques
,
2005,
SIGMETRICS '05.
[2]
Carey L. Williamson,et al.
A Longitudinal Study of P2P Traffic Classification
,
2006,
14th IEEE International Symposium on Modeling, Analysis, and Simulation.
[3]
Konstantina Papagiannaki,et al.
Toward the Accurate Identification of Network Applications
,
2005,
PAM.
[4]
Renata Teixeira,et al.
Traffic classification on the fly
,
2006,
CCRV.
[5]
Vern Paxson,et al.
Bro: a system for detecting network intruders in real-time
,
1998,
Comput. Networks.
[6]
George C. Polyzos,et al.
A Parameterizable Methodology for Internet Traffic Flow Profiling
,
1995,
IEEE J. Sel. Areas Commun..
[7]
Usama M. Fayyad,et al.
Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning
,
1993,
IJCAI.
[8]
Michalis Faloutsos,et al.
Internet traffic classification demystified: myths, caveats, and the best practices
,
2008,
CoNEXT '08.
[9]
S. Zander,et al.
An Architecture for Automated Network Control of QoS over Consumer Broadband Links
,
2005,
TENCON 2005 - 2005 IEEE Region 10 Conference.