Gradient descent evolved imbalanced data gravitation classification with an application on Internet video traffic identification

Abstract In the last decade, the increasing video traffic, especially illegal videos brought big challenges for Internet management. Generally, abnormal videos, such as illegal videos only account for a small percentage which makes the detection of such videos to be a typical imbalanced classification problem. In this study, we propose a new imbalanced learning method, namely, the imbalanced data gravitation classification model based the gradient descent (IDGC-GD), to handle imbalanced problems. In IDGC-GD model, we use the gradient descent algorithm to optimize feature weights of the imbalanced data gravitation classification (IDGC) model. Then, we try to build an accurate video traffic identification solution using IDGC-GD. We conduct a set of comparing experiments between IDGC-GD and seven imbalanced learning algorithms using 21 open data sets and four video traffic data sets collected from the real application. Experimental results show that our method is promising for solving imbalanced problems, including Internet video traffic identification.

[1]  Jiong Jin,et al.  Novel feature selection and classification of Internet video traffic based on a hierarchical scheme , 2017, Comput. Networks.

[2]  Mohammad Shakibazad,et al.  Presenting a method to perform cyber maneuvers , 2018 .

[3]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[4]  Du Min,et al.  Online Internet traffic identification algorithm based on multistage classifier , 2013, China Communications.

[5]  Jiong Jin,et al.  Fine-Grained Classification of Internet Video Traffic From QoS Perspective Using Fractal Spectrum , 2020, IEEE Transactions on Multimedia.

[6]  Francisco Herrera,et al.  SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary , 2018, J. Artif. Intell. Res..

[7]  Francisco Herrera,et al.  Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics , 2012, Expert Syst. Appl..

[8]  Bo Yang,et al.  Imbalanced traffic identification using an imbalanced data gravitation-based classification model , 2017, Comput. Commun..

[9]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[10]  Ching-Hsue Cheng,et al.  Discovering medical quality of total hip arthroplasty by rough set classifier with imbalanced class , 2011, Quality & Quantity.

[11]  Salvatore J. Stolfo,et al.  Distributed data mining in credit card fraud detection , 1999, IEEE Intell. Syst..

[12]  Vilas N. Ghate,et al.  Optimal MLP neural network classifier for fault detection of three phase induction motor , 2010, Expert Syst. Appl..

[13]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[14]  Huchuan Lu,et al.  Sample-Specific SVM Learning for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Bo Yang,et al.  A fast feature weighting algorithm of data gravitation classification , 2017, Inf. Sci..

[16]  Chris Sanders,et al.  Practical Packet Analysis: Using Wireshark to Solve Real-World Network Problems , 2007 .

[17]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[18]  Ran Dubin,et al.  I Know What You Saw Last Minute—Encrypted HTTP Adaptive Video Streaming Title Classification , 2016, IEEE Transactions on Information Forensics and Security.

[19]  Yang Bo,et al.  Traffic Labeller: Collecting Internet traffic samples with accurate application information , 2014, China Communications.

[20]  Mohammed Bennamoun,et al.  Cost-Sensitive Learning of Deep Feature Representations From Imbalanced Data , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Bo Yang,et al.  Data gravitation based classification , 2009, Inf. Sci..

[22]  Yuehui Chen,et al.  A new approach for imbalanced data classification based on data gravitation , 2014, Inf. Sci..

[23]  Stefano Panzieri,et al.  Urban traffic flow forecasting through statistical and neural network bagging ensemble hybrid modeling , 2015, Neurocomputing.

[24]  Jing Zhao,et al.  ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data , 2013, Neurocomputing.

[25]  Sebastián Ventura,et al.  Effective lazy learning algorithm based on a data gravitation model for multi-label learning , 2016, Inf. Sci..

[26]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).