Genetic programming for borderline instance detection in high-dimensional unbalanced classification

In classification, when class overlap is intertwined with the issue of class imbalance, it is often challenging to discover useful patterns because of an ambiguous boundary between the majority class and the minority class. This becomes more difficult if the data is high-dimensional. To date, very few pieces of work have investigated how the class overlap issue can be effectively addressed or alleviated in classification with high-dimensional unbalanced data. In this paper, we propose a new genetic programming based method, which is able to automatically and directly detect borderline instances, in order to address the class overlap issue in classification with high-dimensional unbalanced data. In the proposed method, each individual has two trees to be trained together based on different classification rules. The proposed method is examined and compared with baseline methods on high-dimensional unbalanced datasets. Experimental results show that the proposed method achieves better classification performance than the baseline methods in almost all cases.

[1]  Bing Xue,et al.  A Cost-sensitive Genetic Programming Approach for High-dimensional Unbalanced Classification , 2019, 2019 IEEE Symposium Series on Computational Intelligence (SSCI).

[2]  Mikel Galar,et al.  Addressing the Overlapping Data Problem in Classification Using the One-vs-One Decomposition Strategy , 2019, IEEE Access.

[3]  Mengjie Zhang,et al.  Genetic programming for feature construction and selection in classification on high-dimensional data , 2016, Memetic Comput..

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  Bing Xue,et al.  Developing Interval-Based Cost-Sensitive Classifiers by Genetic Programming for Binary High-Dimensional Unbalanced Classification [Research Frontier] , 2021, IEEE Computational Intelligence Magazine.

[6]  Mengjie Zhang,et al.  Genetic programming for high-dimensional imbalanced classification with a new fitness function and program reuse mechanism , 2020, Soft Comput..

[7]  Mark Johnston,et al.  Developing New Fitness Functions in Genetic Programming for Classification With Unbalanced Data , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Mark Johnston,et al.  Reusing Genetic Programming for Ensemble Selection in Classification of Unbalanced Data , 2014, IEEE Transactions on Evolutionary Computation.

[9]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[10]  Mengjie Zhang,et al.  New Fitness Functions in Genetic Programming for Classification with High-dimensional Unbalanced Data , 2019, 2019 IEEE Congress on Evolutionary Computation (CEC).

[11]  Mengjie Zhang,et al.  Generating Redundant Features with Unsupervised Multi-Tree Genetic Programming , 2018, EuroGP.

[12]  Mark Johnston,et al.  Evolving ensembles in multi-objective genetic programming for classification with unbalanced data , 2011, GECCO '11.

[13]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[14]  Xin Yao,et al.  Cost-sensitive classification with genetic programming , 2005, 2005 IEEE Congress on Evolutionary Computation.

[15]  Bing Xue,et al.  Genetic programming for development of cost-sensitive classifiers for binary high-dimensional unbalanced classification , 2020, Appl. Soft Comput..

[16]  Gee Wah Ng,et al.  Classification for overlapping classes using optimized overlapping region detection and soft decision , 2010, 2010 13th International Conference on Information Fusion.

[17]  Mark Johnston,et al.  Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[18]  Lu Liu,et al.  Classification with ClassOverlapping: A Systematic Study , 2010, ICE-B 2010.

[19]  José Salvador Sánchez,et al.  An Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets , 2007, CIARP.

[20]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.

[21]  Debashree Devi,et al.  Learning in presence of class imbalance and class overlapping by using one-class SVM and undersampling technique , 2019, Connect. Sci..

[22]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[23]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[24]  Seoung Bum Kim,et al.  An overlap-sensitive margin classifier for imbalanced and overlapping data , 2018, Expert Syst. Appl..

[25]  Mengjie Zhang,et al.  Genetic programming for multiple-feature construction on high-dimensional classification , 2019, Pattern Recognit..

[26]  Jerzy Stefanowski,et al.  Overlapping, Rare Examples and Class Decomposition in Learning Classifiers from Imbalanced Data , 2013 .

[27]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[28]  Mark Johnston,et al.  Ensemble Learning and Pruning in Multi-Objective Genetic Programming for Classification with Unbalanced Data , 2011, Australasian Conference on Artificial Intelligence.

[29]  Lunzhao Yi,et al.  Feature selection and classification by minimizing overlap degree for class-imbalanced data in metabolomics , 2020 .

[30]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.