Handling Minority Class Problem in Threats Detection Based on Heterogeneous Ensemble Learning Approach

Multiclass problem, such as detecting multi-steps behaviour of Advanced Persistent Threats (APTs) have been a major global challenge, due to their capability to navigates around defenses and to evade detection for a prolonged period of time. Targeted APT attacks present an increasing concern for both cyber security and business continuity. Detecting the rare attack is a classification problem with data imbalance. This paper explores the applications of data resampling techniques, together with heterogeneous ensemble approach for dealing with data imbalance caused by unevenly distributed data elements among classes with our focus on capturing the rare attack. It has been shown that the suggested algorithms provide not only detection capability, but can also classify malicious data traffic corresponding to rare APT attacks.

[1]  John P. Isaacs,et al.  Evaluating Awareness and Perception of Botnet Activity within Consumer Internet-of-Things (IoT) Networks , 2019, Informatics.

[2]  Alan Wee-Chung Liew,et al.  A weighted multiple classifier framework based on random projection , 2019, Inf. Sci..

[3]  William B. Langdon,et al.  Data Fusion by Intelligent Classifier Combination , 2001 .

[4]  Gianluca Bontempi,et al.  Learned lessons in credit card fraud detection from a practitioner perspective , 2014, Expert Syst. Appl..

[5]  Bartosz Krawczyk,et al.  Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets , 2016, Pattern Recognit..

[6]  Witold Pedrycz,et al.  Combining heterogeneous classifiers via granular prototypes , 2018, Appl. Soft Comput..

[7]  Liu Xiao,et al.  Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data , 2016 .

[8]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[9]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[10]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[11]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[12]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[13]  Chun-Xia Zhang,et al.  An experimental study of one- and two-level classifier fusion for different sample sizes , 2011, Pattern Recognit. Lett..

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Chong Peng,et al.  A Supervised Learning Model for High-Dimensional and Large-Scale Data , 2016, ACM Trans. Intell. Syst. Technol..

[16]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[17]  Khaled M. Rabie,et al.  Detection of advanced persistent threat using machine-learning correlation analysis , 2018, Future Gener. Comput. Syst..

[18]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[19]  Bidyut Baran Chaudhuri,et al.  Handling data irregularities in classification: Foundations, trends, and future challenges , 2018, Pattern Recognit..

[20]  Francisco Herrera,et al.  Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics , 2012, Expert Syst. Appl..

[21]  K. P. Soman,et al.  Evaluation of Recurrent Neural Network and its Variants for Intrusion Detection System (IDS) , 2017, Int. J. Inf. Syst. Model. Des..

[22]  Esteban J. Palomo,et al.  Use of an ANN to Value MTF and Melatonin Effect on ADHD Affected Children , 2019, IEEE Access.

[23]  ShangJennifer,et al.  Learning from class-imbalanced data , 2017 .

[24]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[25]  Kevin C. Desouza,et al.  Strategically-motivated advanced persistent threat: Definition, process, tactics and a disinformation model of counterattack , 2019, Comput. Secur..

[26]  Michael R. Chernick,et al.  Resampling Methods , 2002 .

[27]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[28]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[29]  Pei-Chann Chang,et al.  A population-based incremental learning approach with artificial immune system for network intrusion detection , 2016, Eng. Appl. Artif. Intell..

[30]  Thomas M. Chen,et al.  Lessons from Stuxnet , 2011, Computer.

[31]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..