Adaptive learning on mobile network traffic data

ABSTRACT Machine learning based mobile traffic classification has become a popular topic in recent years. As mobile traffic data is dynamic in nature, the static model has become ineffective for the task of classifying future traffic. This is known as the concept drift problem in data streams. To this end, this paper presents an adaptive mobile traffic classification method. Specifically, a method based on the fuzzy competence model is devised to detect concept drift, and a dynamic learning method is presented to update the classification model, so as to adapt to an ever-changing environment at an appropriate time. The concept drift detection method relies on the data distribution instead of the classification error rate. Furthermore, the weights of flow samples are dynamically updated and flow samples are resampled for training a new model when a concept drift is detected. Moreover, recently trained models are saved and used for classification in weighted voting. The weight of each model is updated according to the performance it obtains on the most recent flow samples. On mobile traffic data, experimental results show that our proposed method obtains lower classification error rate with less time consumption on updating models as compared to related methods designed for handling concept drift problems.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[2]  Peter Tiño,et al.  Concept drift detection for online class imbalance learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[3]  Enzo Baccarelli,et al.  Energy-efficient dynamic traffic offloading and reconfiguration of networked data centers for big data stream mobile computing: review, challenges, and a case study , 2016, IEEE Network.

[4]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[5]  Kan Li,et al.  Fuzzy competence model drift detection for data-driven decision support systems , 2017, Knowl. Based Syst..

[6]  Enzo Baccarelli,et al.  Energy-saving adaptive computing and traffic engineering for real-time-service data centers , 2015, 2015 IEEE International Conference on Communication Workshop (ICCW).

[7]  Zhen Liu,et al.  Studying cost-sensitive learning for multi-class imbalance in Internet traffic classification , 2012 .

[8]  Hui Xiong,et al.  Service Usage Classification with Encrypted Internet Traffic in Mobile Messaging Apps , 2016, IEEE Transactions on Mobile Computing.

[9]  Antonio Pescapè,et al.  Multi-classification approaches for classifying mobile app traffic , 2018, J. Netw. Comput. Appl..

[10]  Xiaohong Huang,et al.  A Dynamic Online Traffic Classification Methodology Based on Data Stream Mining , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[11]  Kensuke Fukuda,et al.  Enhancing the Performance of Mobile Traffic Identification with Communication Patterns , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[12]  Keqin Li,et al.  Knowledge-maximized ensemble algorithm for different types of concept drift , 2018, Inf. Sci..

[13]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[14]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[15]  Roberto Souto Maior de Barros,et al.  RCD: A recurring concept drift framework , 2013, Pattern Recognit. Lett..

[16]  Abraham Bernstein,et al.  Entropy-based Concept Shift Detection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[17]  Jing Liu,et al.  Classifying peer-to-peer applications using imbalanced concept-adapting very fast decision tree on IP data stream , 2013, Peer Peer Netw. Appl..

[18]  S. Venkatasubramanian,et al.  An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams , 2006 .

[19]  Xindong Wu,et al.  Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams , 2006, Data Mining and Knowledge Discovery.

[20]  Raj Bhatnagar,et al.  Tracking recurrent concept drift in streaming data using ensemble classifiers , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[21]  Dinil Mon Divakaran,et al.  SLIC: Self-Learning Intelligent Classifier for network traffic , 2015, Comput. Networks.

[22]  Zhen Liu,et al.  Classifying imbalanced Internet traffic based PCDD: a per concept drift detection method , 2013, Smart Comput. Rev..

[23]  Dawn Xiaodong Song,et al.  NetworkProfiler: Towards automatic fingerprinting of Android apps , 2013, 2013 Proceedings IEEE INFOCOM.

[24]  Geoffrey I. Webb,et al.  Characterizing concept drift , 2015, Data Mining and Knowledge Discovery.

[25]  Nino Vincenzo Verde,et al.  Analyzing Android Encrypted Network Traffic to Identify User Actions , 2016, IEEE Transactions on Information Forensics and Security.

[26]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[27]  Zhen Liu,et al.  Mobilegt: A system to collect mobile traffic trace and build the ground truth , 2016, 2016 26th International Telecommunication Networks and Applications Conference (ITNAC).

[28]  Yong Liao,et al.  AppPrint: Automatic Fingerprinting of Mobile Applications in Network Traffic , 2015, PAM.

[29]  Ning Lu,et al.  Concept drift detection via competence models , 2014, Artif. Intell..

[30]  Nathalie Japkowicz,et al.  Concept-Learning in the Presence of Between-Class and Within-Class Imbalances , 2001, Canadian Conference on AI.

[31]  Jasleen Kaur,et al.  Can Android Applications Be Identified Using Only TCP/IP Headers of Their Launch Time Traffic? , 2016, WISEC.

[32]  João Gama,et al.  Change Detection in Learning Histograms from Data Streams , 2007, EPIA Workshops.

[33]  Ernestina Menasalvas Ruiz,et al.  Mining Recurring Concepts in a Dynamic Feature Space , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[35]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[36]  ArmitageGrenville,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006 .

[37]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[38]  Zhen Liu,et al.  Classifying imbalanced Internet traffic based PCDD , 2013 .

[39]  Akihiro Nakao,et al.  Adaptive mobile application identification through in-network machine learning , 2016, 2016 18th Asia-Pacific Network Operations and Management Symposium (APNOMS).