A Generalized Optimization Embedded Framework of Undersampling Ensembles for Imbalanced Classification

Imbalanced classification exists commonly in practical applications, and it has always been a challenging issue. Traditional classification methods have poor performance on imbalanced data, especially, on the minority class. However, the minority class is usually of our interest, and its misclassification cost is higher. The critical factor is the intrinsic complicated distribution characteristics in imbalanced data itself. Resampling ensemble learning achieves promising results and is a research focus recently. However, some resampling ensembles do not consider complicated distribution characteristics, thus limiting the performance improvement. In this paper, a generalized optimization embedded framework (GOEF) is proposed based on undersampling bagging. The GOEF aims to pay more attention to the learning of local regions to handle the complicated distribution characteristics. Specifically, the GOEF utilizes out-of-bag data to explore heterogeneous local areas and chooses misclassified examples to optimize base classifiers. The optimization can focus on a single class or both classes. Extensive experiments over synthetic and real datasets demonstrate that GOEF with the minority class optimization performs the best in terms of AUC, G-mean, and sensitivity, compared with five resampling ensemble methods.