A Genetic-Based Ensemble Learning Applied to Imbalanced Data Classification

Imbalanced data classification is still a focus of intense research, due to its ever-growing presence in the real-life decision tasks. In this article, we focus on a classifier ensemble for imbalanced data classification. The ensemble is formed on the basis of the individual classifiers trained on supervise-selected feature subsets. There are several methods employing this concept to ensure a high diverse ensemble, nevertheless most of them, as Random Subspace or Random Forest, select attributes for a particular classifier randomly. The main drawback of mentioned methods is not giving the ability to supervise and control this task. In following work, we apply a genetic algorithm to the considered problem. Proposition formulates an original learning criterion, taking into consideration not only the overall classification performance but also ensures that trained ensemble is characterised by high diversity. The experimental study confirmed the high efficiency of the proposed algorithm and its superiority to other ensemble forming method based on random feature selection.

[1]  Yang Xu,et al.  FEATURE SELECTION FOR IMBALANCED DATASETS BASED ON IMPROVED GENETIC ALGORITHM , 2014 .

[2]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[3]  Manuel Graña,et al.  Guest Editorial: Hybrid intelligent fusion systems , 2014, Inf. Fusion.

[4]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[7]  Luís Torgo,et al.  Relevance-Based Evaluation Metrics for Multi-class Imbalanced Domains , 2017, PAKDD.

[8]  Bartosz Krawczyk,et al.  The deterministic subspace method for constructing classifier ensembles , 2017, Pattern Analysis and Applications.

[9]  Regina Berretta,et al.  Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification , 2016, PloS one.

[10]  Michal Wozniak,et al.  Imbalanced Data Classification Based on Feature Selection Techniques , 2018, IDEAL.

[11]  Anne M. P. Canuto,et al.  A genetic-based approach to features selection for ensembles using a hybrid and adaptive fitness function , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[12]  Zbigniew Michalewicz,et al.  Handbook of Evolutionary Computation , 1997 .

[13]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[14]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[15]  Chih-Ming Chen,et al.  An efficient fuzzy classifier with feature selection based on fuzzy entropy , 2001, IEEE Trans. Syst. Man Cybern. Part B.