On Addressing the Imbalance Problem: A Correlated KNN Approach for Network Traffic Classification

With the arrival of big data era, the Internet traffic is growing exponentially. A wide variety of applications arise on the Internet and traffic classification is introduced to help people manage the massive applications on the Internet for security monitoring and quality of service purposes. A large number of Machine Learning (ML) algorithms are introduced to deal with traffic classification. A significant challenge to the classification performance comes from imbalanced distribution of data in traffic classification system. In this paper, we proposed an Optimised Distance-based Nearest Neighbor (ODNN), which has the capability of improving the classification performance of imbalanced traffic data. We analyzed the proposed ODNN approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments were implemented on the real-world traffic dataset. The results show that the performance of “small classes” can be improved significantly even only with small number of training data and the performance of “large classes” remains stable.

[1]  Béla Hullár,et al.  Early Identification of Peer-to-Peer Traffic , 2011, 2011 IEEE International Conference on Communications (ICC).

[2]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[3]  Dario Rossi,et al.  Abacus: Accurate behavioral classification of P2P-TV traffic , 2011, Comput. Networks.

[4]  Albert Cabellos-Aparicio,et al.  Analysis of the impact of sampling on NetFlow traffic classification , 2011, Comput. Networks.

[5]  Zhi-Li Zhang,et al.  A Modular Machine Learning System for Flow-Level Traffic Classification in Large Networks , 2012, TKDD.

[6]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[7]  Jun Zhang,et al.  Internet traffic clustering with constraints , 2012, 2012 8th International Wireless Communications and Mobile Computing Conference (IWCMC).

[8]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[9]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[10]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[11]  Yang Xiang,et al.  An automatic application signature construction system for unknown traffic , 2010 .

[12]  Vipin Kumar,et al.  Evaluating boosting algorithms to classify rare classes: comparison and improvements , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[13]  Edward Y. Chang,et al.  Class-Boundary Alignment for Imbalanced Dataset Learning , 2003 .

[14]  Konstantina Papagiannaki,et al.  A pragmatic definition of elephants in internet backbone traffic , 2002, IMW '02.

[15]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[16]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[17]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[18]  Sebastian Zander,et al.  Timely and Continuous Machine-Learning-Based Classification for Interactive IP Traffic , 2012, IEEE/ACM Transactions on Networking.

[19]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[20]  Judith Kelner,et al.  Better network traffic identification through the independent combination of techniques , 2010, J. Netw. Comput. Appl..

[21]  Xenofontas A. Dimitropoulos,et al.  Classifying internet one-way traffic , 2012, Internet Measurement Conference.

[22]  Marco Mellia,et al.  Mining Unclassified Traffic Using Automatic Clustering Techniques , 2011, TMA.