A Threshold-free Classification Mechanism in Genetic Programming for High-dimensional Unbalanced Classification

Class imbalance is an unavoidable issue in many real-world applications. Learning from unbalanced data, classifiers are often biased toward the majority class, while the minority class is important as well (even more important in many cases). How the issue of class imbalance is addressed becomes more challenging if a classification task further encounters the high dimensionality issue. This paper proposes a new genetic programming (GP) approach to high-dimensional unbalanced classification. A new classification mechanism is proposed for GP to improve its classification performance. This new classification mechanism is independent of a classification threshold to separate the majority class and the minority class. The effectiveness of the proposed method is examined on seven high-dimensional unbalanced datasets. Experimental results indicate that the proposed GP method often performs better than other GP methods that use a fitness function to solve the issue of class imbalance, in terms of classification performance and training time.

[1]  María José del Jesús,et al.  A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets , 2013, Knowl. Based Syst..

[2]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[3]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[4]  Mengjie Zhang,et al.  Fitness Functions in Genetic Programming for Classification with Unbalanced Data , 2007, Australian Conference on Artificial Intelligence.

[5]  Mengjie Zhang,et al.  New Fitness Functions in Genetic Programming for Classification with High-dimensional Unbalanced Data , 2019, 2019 IEEE Congress on Evolutionary Computation (CEC).

[6]  Mark Johnston,et al.  Evolving ensembles in multi-objective genetic programming for classification with unbalanced data , 2011, GECCO '11.

[7]  Mark Johnston,et al.  Reusing Genetic Programming for Ensemble Selection in Classification of Unbalanced Data , 2014, IEEE Transactions on Evolutionary Computation.

[8]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  Mark Johnston,et al.  Sampling Methods in Genetic Programming for Classification with Unbalanced Data , 2010, Australasian Conference on Artificial Intelligence.

[11]  Bing Xue,et al.  A Cost-sensitive Genetic Programming Approach for High-dimensional Unbalanced Classification , 2019, 2019 IEEE Symposium Series on Computational Intelligence (SSCI).

[12]  Francisco Herrera,et al.  SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory , 2012, Knowledge and Information Systems.

[13]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[14]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[15]  Guodong Guo,et al.  Learning from examples in the small sample case: face expression recognition , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  Mark Johnston,et al.  Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[17]  Mark Johnston,et al.  Ensemble Learning and Pruning in Multi-Objective Genetic Programming for Classification with Unbalanced Data , 2011, Australasian Conference on Artificial Intelligence.

[18]  Vasile Palade,et al.  Class Imbalance Learning Methods for Support Vector Machines , 2013 .

[19]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[20]  Mengjie Zhang,et al.  Reuse of program trees in genetic programming with a new fitness function in high-dimensional unbalanced classification , 2019, GECCO.

[21]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[22]  Taghi M. Khoshgoftaar,et al.  Feature Selection with High-Dimensional Imbalanced Data , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[23]  T.M. Padmaja,et al.  Unbalanced data classification using extreme outlier elimination and sampling techniques for fraud detection , 2007, 15th International Conference on Advanced Computing and Communications (ADCOM 2007).

[24]  Yue Xu,et al.  Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets , 2018, Inf. Sci..

[25]  Fan Yang,et al.  Using random forest for reliable classification and cost-sensitive learning for medical diagnosis , 2009, BMC Bioinformatics.

[26]  Zhi-Hua Zhou,et al.  Cost-Sensitive Face Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Mark Johnston,et al.  Developing New Fitness Functions in Genetic Programming for Classification With Unbalanced Data , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  Xin Yao,et al.  Cost-sensitive classification with genetic programming , 2005, 2005 IEEE Congress on Evolutionary Computation.

[29]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[30]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[31]  Zhi-Hua Zhou,et al.  Towards Cost-Sensitive Learning for Real-World Applications , 2011, PAKDD Workshops.

[32]  Mahendra Sahare,et al.  A Review of Multi-Class Classification for Imbalanced Data , 2012 .