Machine Learned Real-Time Traffic Classifiers

Network traffic classification plays an important role in various network activities. Due to the ineffectiveness of traditional port-based and payload-based methods, recent works proposed using machine learning methods to classify flows based on statistical characteristics. In this study, we evaluate the effectiveness of machine learning techniques on the real-time traffic classification problem. We identify the most suitable ML classifier for network traffic classification by comparing various ML schemes,including both supervised and unsupervised methods. We also apply feature selection to identify significant features. Finally, we simulate real-time classification by using features derived from the first few packets of each flow.The results show that classifiers based on decision tree outperform others on both accuracy and performance; and that classifiers based on early flow properties can achieve high accuracy while reducing the computational complexity.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Remco R. Bouckaert,et al.  Bayesian network classifiers in Weka , 2004 .

[3]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[4]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[5]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[6]  Carey L. Williamson,et al.  Categories and Subject Descriptors: C.4 [Computer Systems Organization]Performance of Systems , 2022 .

[7]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[8]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[9]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[10]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Konstantina Papagiannaki,et al.  Toward the Accurate Identification of Network Applications , 2005, PAM.

[13]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[14]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[15]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[17]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[18]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[21]  Jeffrey Erman,et al.  Internet Traffic Identification using Machine Learning , 2006 .

[22]  Michalis Faloutsos,et al.  Is P2P dying or just hiding? [P2P traffic measurement] , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[23]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[24]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[25]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[26]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.