Extending labeled mobile network traffic data by three levels traffic identification fusion

Abstract Mobile traffic classification is critically important for the decision-making of network management such as traffic shaping and traffic pricing. Labeled traffic data are the requisite of classification performance evaluation. However, existing works mostly acquired labeled traffic on a simulation environment such as individually running a specific app on mobile devices to collect its traffic. This way is slow and not scalable. This paper devises a scheme to automatically link the ground truth to mobile traffic. A set of labeled traffic data are firstly collected by our previously presented mobilegt (a system to collect mobile traffic and build the ground truth) on the monitored mobile devices. But these traffic are limited to the monitored nodes. Therefore, we present a method named ELD (Extending Labeled Data) to identify the label of newly unknown mobile traffic, so as to extend the labeled mobile traffic data. ELD proceeds traffic identification into packet header, packet payload and flow statistic levels. The three levels’ traffic identification tasks are implemented by ServerTag, payload distribution inspection and Random Forest respectively. ELD is able to identify the mobile traffic with encrypted payload. The cross validation results show that ELD achieves 99% flow accuracy and 95.4% byte accuracy on average when the flow and byte completeness are respectively 86.5% and 65.5%. The results also prove that ELD outperforms existing works, nDPI and Libprotoident, on labeling mobile network traffic.

[1]  Nino Vincenzo Verde,et al.  Analyzing Android Encrypted Network Traffic to Identify User Actions , 2016, IEEE Transactions on Information Forensics and Security.

[2]  Alok Tongaonkar A Look at the Mobile App Identification Landscape , 2016, IEEE Internet Computing.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Yu Zhou,et al.  A Semantics-Aware Approach to the Automated Network Protocol Identification , 2016, IEEE/ACM Transactions on Networking.

[5]  James Won-Ki Hong,et al.  Fine‐grained traffic classification based on functional separation , 2013, Int. J. Netw. Manag..

[6]  Pere Barlet-Ros,et al.  Independent comparison of popular DPI tools for traffic classification , 2015, Comput. Networks.

[7]  Kensuke Fukuda,et al.  Enhancing the Performance of Mobile Traffic Identification with Communication Patterns , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[8]  Pere Barlet-Ros,et al.  Is Our Ground-Truth for Traffic Classification Reliable? , 2014, PAM.

[9]  Zhen Liu,et al.  A class-oriented feature selection approach for multi-class imbalanced network traffic datasets based on local and global metrics fusion , 2015, Neurocomputing.

[10]  Peter Holland,et al.  Can Passive Mobile Application Traffic be Identified using Machine Learning Techniques , 2015 .

[11]  Kensuke Fukuda,et al.  Combining Communication Patterns & Traffic Patterns to Enhance Mobile Traffic Identification Performance , 2016, Journal of Information Processing.

[12]  Jie Wu,et al.  Robust Network Traffic Classification , 2015, IEEE/ACM Transactions on Networking.

[13]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[14]  Marco Fiore,et al.  Large-Scale Mobile Traffic Analysis: A Survey , 2016, IEEE Communications Surveys & Tutorials.

[15]  Niccolo Cascarano,et al.  GT: picking up the truth from the ground for internet traffic , 2009, CCRV.

[16]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[17]  Hui Xiong,et al.  Service Usage Classification with Encrypted Internet Traffic in Mobile Messaging Apps , 2016, IEEE Transactions on Mobile Computing.