Addressing the train-test gap on traffic classification combined subflow model with ensemble learning

Abstract Previous machine learning-based network traffic classification approaches hold the assumption that training and testing network environment are of the same. This assumption is invalid in most real cases due to the changes in traffic features and leads to the train–test gap issue: the model trained in the training environment performs poorly in the testing environment. In this paper, to address the gap, we propose CSA: a traffic classification approach based on packet-wise segmentation and aggregation. Firstly, we observe that some specific fragments of network flows – subflows – are robust against the gap. Therefore, we are motivated to segment the traffic flows into different subflows. Afterward, with the justification of our feature selection, 26 statistical features are extracted from each subflow and input into its corresponding sub-classifier. Secondly, with the results from sub-classifiers, we develop an aggregation method based on their classification accuracy to increase the overall classification performance. We experiment on five real datasets, including three collected from the Northwest Center of CERNET (China Education and Research Network) and two from public traces. By comparing with state-of-the-art baselines, the experiment results demonstrate the effectiveness of our CSA against the gap.

[1]  Marco Canini,et al.  Experience with high-speed automated application-identification for network-management , 2009, ANCS '09.

[2]  Jie Lu,et al.  Accumulating regional density dissimilarity for concept drift detection in data streams , 2018, Pattern Recognit..

[3]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[4]  Luis Hernández-Callejo,et al.  Ensemble network traffic classification: Algorithm comparison and novel ensemble scheme proposal , 2017, Comput. Networks.

[5]  Sung-Ho Yoon,et al.  Internet Application Traffic Classification Using Fixed IP-Port , 2009, APNOMS.

[6]  Eduardo Rocha,et al.  A Survey of Payload-Based Traffic Classification Approaches , 2014, IEEE Communications Surveys & Tutorials.

[7]  Kensuke Fukuda,et al.  MAWILab: combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking , 2010, CoNEXT.

[8]  Antonio Nucci,et al.  Towards self adaptive network traffic classification , 2015, Comput. Commun..

[9]  Mohammad Reza Khayyambashi,et al.  Real-Time Traffic Classification Based on Statistical and Payload Content Features , 2010, 2010 2nd International Workshop on Intelligent Systems and Applications.

[10]  Juan Carlos Corrales,et al.  Consumption Behavior Analysis of Over the Top Services: Incremental Learning or Traditional Methods? , 2019, IEEE Access.

[11]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[12]  Michel Verleysen,et al.  Mutual information-based feature selection for multilabel classification , 2013, Neurocomputing.

[13]  Stefan Savage,et al.  Unexpected means of protocol inference , 2006, IMC '06.

[14]  D. Massart,et al.  The Mahalanobis distance , 2000 .

[15]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[16]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[17]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[18]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[19]  Li Wei,et al.  Network Traffic Classification Using K-means Clustering , 2007 .

[20]  Junhong Wang,et al.  Dynamic extreme learning machine for data stream classification , 2017, Neurocomputing.

[21]  Axel-Cyrille Ngonga Ngomo,et al.  Ensemble Learning for Named Entity Recognition , 2014, SEMWEB.

[22]  Jia Wang,et al.  Analyzing peer-to-peer traffic across large networks , 2004, IEEE/ACM Trans. Netw..

[23]  Pablo M. Granitto,et al.  Neural network ensembles: evaluation of aggregation algorithms , 2005, Artif. Intell..

[24]  Radia J. Perlman,et al.  Network security - private communication in a public world , 2002, Prentice Hall series in computer networking and distributed systems.

[25]  Feng Xiao,et al.  Network traffic classification based on transfer learning , 2018, Comput. Electr. Eng..

[26]  Sebastian Zander,et al.  Timely and Continuous Machine-Learning-Based Classification for Interactive IP Traffic , 2012, IEEE/ACM Transactions on Networking.

[27]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[28]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[29]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[30]  A. Lazar,et al.  Design, Analysis and Simulation of the Progressive Second Price Auction for Network Bandwidth Sharing , 1998 .

[31]  Guangquan Zhang,et al.  Learning under Concept Drift: A Review , 2019, IEEE Transactions on Knowledge and Data Engineering.

[32]  Yu Sun,et al.  Concept Drift Adaptation by Exploiting Historical Knowledge , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Emiliano De Cristofaro,et al.  Privacy in content-oriented networking: threats and countermeasures , 2012, CCRV.

[34]  Josef Kittler,et al.  Combining classifiers: A theoretical framework , 1998, Pattern Analysis and Applications.

[35]  Ning Lu,et al.  Concept drift detection via competence models , 2014, Artif. Intell..

[36]  Jun Zhang,et al.  Internet Traffic Classification by Aggregating Correlated Naive Bayes Predictions , 2013, IEEE Transactions on Information Forensics and Security.

[37]  Chase Cotton,et al.  Packet-level traffic measurements from the Sprint IP backbone , 2003, IEEE Netw..

[38]  Maurizio Martinelli,et al.  nDPI: Open-source high-speed deep packet inspection , 2014, 2014 International Wireless Communications and Mobile Computing Conference (IWCMC).

[39]  Jie Wu,et al.  Robust Network Traffic Classification , 2015, IEEE/ACM Transactions on Networking.

[40]  Milton L. Mueller,et al.  Deep packet inspection and bandwidth management: Battles over BitTorrent in Canada and the United States , 2012 .

[41]  Ece Guran Schmidt,et al.  Machine learning algorithms for accurate flow-based network traffic classification: Evaluation and comparison , 2010, Perform. Evaluation.

[42]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[43]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[44]  Hardeep Singh,et al.  Performance Analysis of Unsupervised Machine Learning Techniques for Network Traffic Classification , 2015, 2015 Fifth International Conference on Advanced Computing & Communication Technologies.

[45]  J. Erman,et al.  QRP05-4: Internet Traffic Identification using Machine Learning , 2006, IEEE Globecom 2006.

[46]  Hiroshi Esaki,et al.  Synoptic Graphlet: Bridging the Gap Between Supervised and Unsupervised Profiling of Host-Level Network Traffic , 2013, IEEE/ACM Transactions on Networking.

[47]  Mohsen Guizani,et al.  A Heuristic Statistical Testing Based Approach for Encrypted Network Traffic Identification , 2019, IEEE Transactions on Vehicular Technology.