An Over-sampling Expert System for Learing from Imbalanced Data Sets

Learning from imbalanced datasets has become an important branch in the machine learning field. A relatively simple and effective method to solve the imbalance problem is re-sampling, which contains under-sampling and over-sampling. A representative over-sampling approach is SMOTE (synthetic minority over-sampling technique). However, it is not easy to decide the best distribution of minority and majority samples included in a given training set when SMOTE is applied to the imbalance situation. This paper presents an over-sampling expert system to ensemble classifiers trained on the data sets over-sampled at different rates. The proposed combination method, C-SMOTE, applied to several highly and moderately imbalanced data sets can automatically and intelligently obtain an optimal SMOTE rate, and shows improvement in prediction accuracy and overall F-measure on the minority class