A multilevel taxonomy and requirements for an optimal traffic‐classification model

SUMMARY Identifying Internet traffic applications is essential for network security and management. The steady emergence of new Internet applications, together with the use of encryption and obfuscation techniques, ensures that traffic classification remains a hot research topic. Much research has been devoted to this topic by the research community in the last decade. However, an optimal traffic classification model has yet to be defined. Many techniques and formats have been described, with the current literature therefore lacking appropriate benchmarks expressed in a consistent terminology. Moreover, existing surveys are outdated and do not include many recent advances in the field. In this article, we present a systematic multilevel taxonomy that covers a broad range of existing and recently proposed methods, together with examples of vendor classification techniques. Our taxonomy assists in defining a consistent terminology. It could be useful in future benchmarking contexts by characterizing and comparing methods at three different levels. From this perspective, we describe key features and provide design hints for future classification models, while emphasizing the main requirements for promoting future research efforts. To motivate researchers and other interested parties, we collect and share data captured from real traffic, using two models to protect data privacy. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  Xing Li,et al.  Identification of P2P traffic based on the content redistribution characteristic , 2007, 2007 International Symposium on Communications and Information Technologies.

[2]  Ming Chen,et al.  DEAPFI: A distributed extensible architecture for P2P flows identification , 2009, 2009 IEEE International Conference on Network Infrastructure and Digital Content.

[3]  Jun Zhang,et al.  Classification of Correlated Internet Traffic Flows , 2012, 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications.

[4]  A. Pradhan Network Traffic Classification using Support Vector Machine and Artificial Neural Network , 2011 .

[5]  Giacomo Verticale,et al.  Performance evaluation of a machine learning algorithm for early application identification , 2008, 2008 International Multiconference on Computer Science and Information Technology.

[6]  Zhou Zhou,et al.  RocketTC: A high throughput traffic classification architecture , 2012, 2012 International Conference on Computing, Networking and Communications (ICNC).

[7]  Sergios Theodoridis,et al.  Pattern Recognition, Fourth Edition , 2008 .

[8]  Jun Zhang,et al.  Semi-supervised and Compound Classification of Network Traffic , 2012, 2012 32nd International Conference on Distributed Computing Systems Workshops.

[9]  Su Chang,et al.  Correlation Based Node Behavior Profiling for Enterprise Network Security , 2009, 2009 Third International Conference on Emerging Security Information, Systems and Technologies.

[10]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[11]  Wang Qiang,et al.  Reinforcement learning model, algorithms and its application , 2011, 2011 International Conference on Mechatronic Science, Electric Engineering and Computer (MEC).

[12]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[13]  Tao Qin,et al.  P2P Traffic Identification Based on the Signatures of Key Packets , 2009, 2009 IEEE 14th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks.

[14]  Sung-Ho Yoon,et al.  Study on traffic classification taxonomy for multilateral and hierarchical traffic classification , 2012, 2012 14th Asia-Pacific Network Operations and Management Symposium (APNOMS).

[15]  Antonio Pescapè,et al.  Issues and future directions in traffic classification , 2012, IEEE Network.

[16]  Errin W. Fulp,et al.  Using Network Motifs to Identify Application Protocols , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[17]  George Varghese,et al.  Graption: A graph-based P2P traffic classification framework for the internet backbone , 2011, Comput. Networks.

[18]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[19]  Xuemin Shen,et al.  Handbook of Peer-to-Peer Networking , 2009 .

[20]  Mohammad Reza Khayyambashi,et al.  Real-Time Traffic Classification Based on Statistical and Payload Content Features , 2010, 2010 2nd International Workshop on Intelligent Systems and Applications.

[21]  Chao Liu,et al.  A statistical-feature-based approach to internet traffic classification using Machine Learning , 2009, 2009 International Conference on Ultra Modern Telecommunications & Workshops.

[22]  Antonio Nucci,et al.  CUTE: Traffic Classification Using TErms , 2012, 2012 21st International Conference on Computer Communications and Networks (ICCCN).

[23]  Chengjie Gu,et al.  A novel P2P traffic classification approach using back propagation neural network , 2010, 2010 IEEE 12th International Conference on Communication Technology.

[24]  Stuart Cheshire,et al.  Internet Assigned Numbers Authority (IANA) Procedures for the Management of the Service Name and Transport Protocol Port Number Registry , 2011, RFC.

[25]  Wang Xin,et al.  Research of P2P Traffic Comprehensive Identification Method , 2011, 2011 International Conference on Network Computing and Information Security.

[26]  Yoohwan Kim,et al.  Baseline Profile Stability for Network Anomaly Detection , 2008 .

[27]  István Szabó,et al.  Accurate Traffic Classification , 2007, 2007 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks.

[28]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[29]  Jesús E. Díaz-Verdejo,et al.  Performance of OpenDPI in Identifying Sampled Network Traffic , 2013, J. Networks.

[30]  Eugeniy Belyaev,et al.  Temporal scalability comparison of the H.264/SVC and distributed video codec , 2009, 2009 International Conference on Ultra Modern Telecommunications & Workshops.

[31]  David J. Parish,et al.  Detection of applications within encrypted tunnels using packet size distributions , 2009, 2009 International Conference for Internet Technology and Secured Transactions, (ICITST).

[32]  Antonio Pescapè,et al.  TIE: A Community-Oriented Traffic Classification Platform , 2009, TMA.

[33]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[34]  Riyad Alshammari,et al.  Can encrypted traffic be identified without port numbers, IP addresses and payload inspection? , 2011, Comput. Networks.

[35]  A. Nur Zincir-Heywood,et al.  An investigation on identifying SSL traffic , 2011, 2011 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA).

[36]  George Varghese,et al.  Network monitoring using traffic dispersion graphs (tdgs) , 2007, IMC '07.

[37]  Marco Mellia,et al.  Revealing skype traffic: when randomness plays with you , 2007, SIGCOMM 2007.

[38]  Hongchao Hu,et al.  Identifying P2P flow with behavior characteristics , 2010, 2010 2nd International Conference on Future Computer and Communication.

[39]  István Szabó,et al.  On the Validation of Traffic Classification Algorithms , 2008, PAM.

[40]  James Won-Ki Hong,et al.  Toward fine-grained traffic classification , 2011, IEEE Communications Magazine.

[41]  Jing Yuan,et al.  Information Entropy Based Clustering Method for Unsupervised Internet Traffic Classification , 2008, 2008 IEEE International Conference on Communications.

[42]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[43]  Marcel Waldvogel,et al.  BitTorrent traffic obfuscation: A chase towards semantic traffic identification , 2012, 2012 IEEE 12th International Conference on Peer-to-Peer Computing (P2P).

[44]  Erik Hjelmvik,et al.  Statistical Protocol IDentification with SPID: Preliminary Results , 2009 .

[45]  David J. Parish,et al.  Optimised Multi-stage TCP Traffic Classifier Based on Packet Size Distributions , 2010, 2010 Third International Conference on Communication Theory, Reliability, and Quality of Service.

[46]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[47]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[48]  Chen-Nee Chuah,et al.  A novel self-learning architecture for p2p traffic classification in high speed networks , 2010, Comput. Networks.

[49]  Fakhri Karray,et al.  Early internet traffic recognition based on machine learning methods , 2012, 2012 25th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

[50]  Judith Kelner,et al.  Better network traffic identification through the independent combination of techniques , 2010, J. Netw. Comput. Appl..

[51]  James Won-Ki Hong,et al.  Automated classifier generation for application-level mobile traffic identification , 2012, 2012 IEEE Network Operations and Management Symposium.

[52]  Giuseppe Aceto,et al.  PortLoad: Taking the Best of Two Worlds in Traffic Classification , 2010, 2010 INFOCOM IEEE Conference on Computer Communications Workshops.

[53]  Zhang Yan,et al.  P2P Traffic Identification Based on NetFlow TCP Flag , 2009, 2009 International Conference on Future Computer and Communication.

[54]  Muttukrishnan Rajarajan,et al.  Enhancements to Statistical Protocol IDentification (SPID) for Self-Organised QoS in LANs , 2010, 2010 Proceedings of 19th International Conference on Computer Communications and Networks.

[55]  Antonio Pescapè,et al.  Classification of Network Traffic via Packet-Level Hidden Markov Models , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[56]  Charles V. Wright,et al.  On Inferring Application Protocol Behaviors in Encrypted Network Traffic , 2006, J. Mach. Learn. Res..

[57]  Jun Zhang,et al.  Internet traffic clustering with constraints , 2012, 2012 8th International Wireless Communications and Mobile Computing Conference (IWCMC).

[58]  M. Basu,et al.  Gating improves neural network performance , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[59]  Naohisa Komatsu,et al.  Internet Traffic Classification Using Score Level Fusion of Multiple Classifier , 2010, 2010 IEEE/ACIS 9th International Conference on Computer and Information Science.

[60]  Min Zhang,et al.  State of the Art in Traffic Classification: A Research Review , 2009 .

[61]  Zhi-Li Zhang,et al.  Unveiling core network-wide communication patterns through application traffic activity graph decomposition , 2009, SIGMETRICS '09.

[62]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[63]  Ihsan Ullah,et al.  A Survey and Synthesis of User Behavior Measurements in P2P Streaming Systems , 2012, IEEE Communications Surveys & Tutorials.

[64]  R. Miruta,et al.  Content aware classification method , 2012, 2012 9th International Conference on Communications (COMM).

[65]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[66]  Andrew W. Moore,et al.  Discriminators for use in flow-based classification , 2013 .

[67]  Renata Teixeira,et al.  Early Recognition of Encrypted Applications , 2007, PAM.

[68]  Jian Gong,et al.  Identifying BT-like P2P Traffic by the Discreteness of Remote Hosts , 2007, 32nd IEEE Conference on Local Computer Networks (LCN 2007).

[69]  LiTing Hu,et al.  Real-time internet traffic identification based on decision tree , 2012, World Automation Congress 2012.

[70]  Aleksandar Kuzmanovic,et al.  Googling the Internet: Profiling Internet Endpoints via the World Wide Web , 2010, IEEE/ACM Transactions on Networking.

[71]  Judith Kelner,et al.  A Survey on Internet Traffic Identification , 2009, IEEE Communications Surveys & Tutorials.