Mobile app traffic flow feature extraction and selection for improving classification robustness

Abstract In machine-learning based mobile app traffic classification, flow feature distributions can easily drift due to changes in network environments, user habits etc. Unstable features may negatively influence mobile app traffic classification robustness, so to remedy this problem, this paper investigates how to obtain optimal feature sets for improving classification robustness of mobile app traffic. Specifically, we develop a method to search for stable and discriminative features by jointly analyzing mobile app traffic characteristics and assessing the degree of feature drift. Along these lines, we first analyze the in-flow behavior characteristics of traffic flows, so as to extract a potential feature set for mobile app traffic data. Next, we present two new metrics to assess the degree of drift experienced by the flow features from different perspectives and design a composite metric to score these features by considering the degree of drift as a penalty factor of discrimination power. Based on these metrics, we further propose an algorithm to search for optimal features with high discrimination power but low degree of drift. Existing flow features and feature selection algorithms were implemented for our comparison experiments. Our experimental results on real mobile app traffic data demonstrate the effectiveness of our feature set and feature selection algorithm on improving classification robustness.

[1]  Gyan Ranjan,et al.  Approximate matching of persistent LExicon using search-engines for classifying Mobile app traffic , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[2]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[3]  Riyad Alshammari,et al.  Can encrypted traffic be identified without port numbers, IP addresses and payload inspection? , 2011, Comput. Networks.

[4]  Tao Qin,et al.  Robust application identification methods for P2P and VoIP traffic classification in backbone networks , 2015, Knowl. Based Syst..

[5]  Cheng Chang,et al.  Automatically identifying apps in mobile traffic , 2016, Concurr. Comput. Pract. Exp..

[6]  Qiang Xu,et al.  Identifying diverse usage behaviors of smartphone apps , 2011, IMC '11.

[7]  Pere Barlet-Ros,et al.  Is Our Ground-Truth for Traffic Classification Reliable? , 2014, PAM.

[8]  Hui Xiong,et al.  Service Usage Classification with Encrypted Internet Traffic in Mobile Messaging Apps , 2016, IEEE Transactions on Mobile Computing.

[9]  Jiong Jin,et al.  Novel feature selection and classification of Internet video traffic based on a hierarchical scheme , 2017, Comput. Networks.

[10]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[11]  ZhangZhi-Li,et al.  Profiling internet backbone traffic , 2005 .

[12]  Dario Rossi,et al.  Abacus: Accurate behavioral classification of P2P-TV traffic , 2011, Comput. Networks.

[13]  Ala I. Al-Fuqaha,et al.  Optimizing an artificial immune system algorithm in support of flow-Based internet traffic classification , 2017, Appl. Soft Comput..

[14]  Jasleen Kaur,et al.  Can Android Applications Be Identified Using Only TCP/IP Headers of Their Launch Time Traffic? , 2016, WISEC.

[15]  Fatih Ertam,et al.  A new approach for internet traffic classification: GA-WK-ELM , 2017 .

[16]  Zhen Liu,et al.  Mobilegt: A system to collect mobile traffic trace and build the ground truth , 2016, 2016 26th International Telecommunication Networks and Applications Conference (ITNAC).

[17]  Youjip Won,et al.  Session-based classification of internet applications in 3G wireless networks , 2011, Comput. Networks.

[18]  Zahir Tari,et al.  An optimal and stable feature selection approach for traffic classification based on multi-criterion fusion , 2014, Future Gener. Comput. Syst..

[19]  Maurizio Martinelli,et al.  nDPI: Open-source high-speed deep packet inspection , 2014, 2014 International Wireless Communications and Mobile Computing Conference (IWCMC).

[20]  Kensuke Fukuda,et al.  Enhancing the Performance of Mobile Traffic Identification with Communication Patterns , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[21]  Ioannis Anagnostopoulos,et al.  Real time enhanced random sampling of online social networks , 2014, J. Netw. Comput. Appl..

[22]  Zhi-Li Zhang,et al.  Profiling internet backbone traffic: behavior models and applications , 2005, SIGCOMM '05.

[23]  Gang Lu,et al.  Feature selection for optimizing traffic classification , 2012, Comput. Commun..

[24]  Xuehua Wang,et al.  Feature selection for high-dimensional imbalanced data , 2013, Neurocomputing.

[25]  Zhi-Li Zhang,et al.  Inferring applications at the network layer using collective traffic statistics , 2010, 2010 22nd International Teletraffic Congress (lTC 22).

[26]  Bo Yang,et al.  Effective packet number for early stage internet traffic identification , 2015, Neurocomputing.

[27]  Zhen Liu,et al.  Balanced feature selection method for Internet traffic classification , 2012, IET Networks.

[28]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[29]  Jesús E. Díaz-Verdejo,et al.  Network traffic application identification based on message size analysis , 2015, J. Netw. Comput. Appl..

[30]  Alok Tongaonkar A Look at the Mobile App Identification Landscape , 2016, IEEE Internet Computing.

[31]  Dawn Xiaodong Song,et al.  NetworkProfiler: Towards automatic fingerprinting of Android apps , 2013, 2013 Proceedings IEEE INFOCOM.

[32]  Dario Rossi,et al.  Identifying Key Features for P2P Traffic Classification , 2011, 2011 IEEE International Conference on Communications (ICC).

[33]  Antonio Pescapè,et al.  Traffic Classification through Joint Distributions of Packet-Level Statistics , 2011, 2011 IEEE Global Telecommunications Conference - GLOBECOM 2011.

[34]  Nino Vincenzo Verde,et al.  Analyzing Android Encrypted Network Traffic to Identify User Actions , 2016, IEEE Transactions on Information Forensics and Security.

[35]  Hiroshi Esaki,et al.  Traffic causality graphs: Profiling network applications through temporal and spatial causality of flows , 2011, 2011 23rd International Teletraffic Congress (ITC).

[36]  Yong Liao,et al.  AppPrint: Automatic Fingerprinting of Mobile Applications in Network Traffic , 2015, PAM.

[37]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[38]  Arian Bär,et al.  IP mining: Extracting knowledge from the dynamics of the Internet addressing space , 2013, Proceedings of the 2013 25th International Teletraffic Congress (ITC).

[39]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[40]  George Varghese,et al.  Graph-Based P2P Traffic Classification at the Internet Backbone , 2009, IEEE INFOCOM Workshops 2009.

[41]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[42]  Yuan-Cheng Lai,et al.  Application classification using packet size distribution and port association , 2009, J. Netw. Comput. Appl..

[43]  Andrew W. Moore,et al.  Discriminators for use in flow-based classification , 2013 .

[44]  Antonio Pescapè,et al.  Multi-classification approaches for classifying mobile app traffic , 2018, J. Netw. Comput. Appl..