Network traffic application identification based on message size analysis

Identifying network applications is centric to many network management and security tasks. A large number of approaches exist in the literature, most of which are based on statistical and machine learning techniques. For protecting the user privacy, the majority of the existing methods rely on discriminative traffic attributes at the network and transport layers, such as interaction schemes, packet sizes and inter-arrival times. In this work, we propose a novel blind, quintuple centric approach by exploring traffic attributes at the application level without inspecting the payloads. The identification model is based on the analysis of the first application-layer messages in a flow (quintuple), based on their sizes, directions and positions in the flow. The underlying idea is that the first messages of a flow usually carry some application level signaling and data transfer units (command, request, response, etc.) that can be discriminative through their patterns of size and direction. A Gaussian mixture model is proposed to characterize the applications, based on a study of the common characteristics of application-level protocols. The blind classifier is based on Markov models with low complexity and reasonable computational requirements, where the training procedure consists of profiling the target applications separately. Promising results were obtained for some popular protocols including many peer-to-peer applications.

[1]  Shunyi Zhang,et al.  Real-Time P2P Traffic Identification , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[2]  Maurizio Dusi,et al.  Traffic classification through simple statistical fingerprinting , 2007, CCRV.

[3]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[4]  Chao Liu,et al.  A statistical-feature-based approach to internet traffic classification using Machine Learning , 2009, 2009 International Conference on Ultra Modern Telecommunications & Workshops.

[5]  Tao Qin,et al.  P2P Traffic Identification Based on the Signatures of Key Packets , 2009, 2009 IEEE 14th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks.

[6]  Li Wei-mi Packet Size Distribution of Typical Internet Applications , 2014 .

[7]  Antonio Pescapè,et al.  Classification of Network Traffic via Packet-Level Hidden Markov Models , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[8]  Chengjie Gu,et al.  A novel P2P traffic classification approach using back propagation neural network , 2010, 2010 IEEE 12th International Conference on Communication Technology.

[9]  Andrea Baiocchi,et al.  Low complexity, high performance neuro-fuzzy system for Internet traffic flows early classification , 2013, 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC).

[10]  Zhe Yang,et al.  Cocktail method for BitTorrent traffic identification in real time , 2012, J. Comput..

[11]  David J. Parish,et al.  Detection of applications within encrypted tunnels using packet size distributions , 2009, 2009 International Conference for Internet Technology and Secured Transactions, (ICITST).

[12]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[13]  Chadi Barakat,et al.  Using host profiling to refine statistical application identification , 2012, 2012 Proceedings IEEE INFOCOM.

[14]  Ben J. A. Kröse,et al.  Efficient Greedy Learning of Gaussian Mixture Models , 2003, Neural Computation.

[15]  Zhang Yan,et al.  Connection Pattern-Based P2P Application Identification Characteristic , 2007, 2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007).

[16]  Young Chang Jo,et al.  Network Application Identification Based on Communication Characteristics of Application Messages , 2009 .

[17]  Wang Xin,et al.  Research of P2P Traffic Comprehensive Identification Method , 2011, 2011 International Conference on Network Computing and Information Security.

[18]  Andrea Baiocchi,et al.  Internet Traffic Privacy Enhancement with Masking: Optimization and Tradeoffs , 2014, IEEE Transactions on Parallel and Distributed Systems.

[19]  Fakhri Karray,et al.  Early internet traffic recognition based on machine learning methods , 2012, 2012 25th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

[20]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[21]  David Moore,et al.  The internet measurement data catalog , 2005, CCRV.

[22]  PJ Radcliffe,et al.  A framework for tunneled traffic analysis , 2010, 2010 The 12th International Conference on Advanced Communication Technology (ICACT).

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Carey L. Williamson,et al.  Offline/realtime traffic classification using semi-supervised learning , 2007, Perform. Evaluation.

[26]  Marcel Waldvogel,et al.  BitTorrent traffic obfuscation: A chase towards semantic traffic identification , 2012, 2012 IEEE 12th International Conference on Peer-to-Peer Computing (P2P).

[27]  David J. Parish,et al.  Optimised Multi-stage TCP Traffic Classifier Based on Packet Size Distributions , 2010, 2010 Third International Conference on Communication Theory, Reliability, and Quality of Service.

[29]  Chadi Barakat,et al.  Enhancing Application Identification by Means of Sequential Testing , 2009, Networking.

[30]  Chadi Barakat,et al.  Can We Trust the Inter-Packet Time for Traffic Classification? , 2011, 2011 IEEE International Conference on Communications (ICC).

[31]  Andrea Baiocchi,et al.  Optimum packet length masking , 2010, 2010 22nd International Teletraffic Congress (lTC 22).

[32]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[33]  Wei Lu,et al.  A Heuristic-Based Co-clustering Algorithm for the Internet Traffic Classification , 2014, 2014 28th International Conference on Advanced Information Networking and Applications Workshops.

[34]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[35]  Satyajit Sarmah,et al.  Classification of network traffic in LAN , 2015, 2015 International Conference on Electronic Design, Computer Networks & Automated Verification (EDCAV).

[36]  Andrew W. Moore,et al.  Discriminators for use in flow-based classification , 2013 .

[37]  Jesús E. Díaz-Verdejo,et al.  A multilevel taxonomy and requirements for an optimal traffic‐classification model , 2014, Int. J. Netw. Manag..

[38]  Carey L. Williamson,et al.  Identifying and discriminating between web and peer-to-peer traffic in the network core , 2007, WWW '07.