Fuzzy distance-based undersampling technique for imbalanced flood data

Performances of classifiers are affected by imbalanced data because instances in the minority class are often ignored. Imbalanced data often occur in many application domains including flood. If flood cases are misclassified, the impact of flood is higher than the misclassification of non-flood cases.Numerous resampling techniques such as undersampling and oversampling have been used to overcome the problem of misclassification of imbalanced data.However, the undersampling and oversampling techniques suffer from elimination of relevant data and overfitting, which may lead to poor classification results.This paper proposes a Fuzzy Distance-based Undersampling (FDUS) technique to increase classification accuracy. Entropy estimation is used to generate fuzzy thresholds which are used to categorise the instances in majority and minority classes into membership functions. The performance of FDUS was compared with three techniques based on Fmeasure and G-mean, experimented on flood data. From the results, FDUS achieved better F-measure and G-mean compared to the other techniques which showed that the FDUS was able to reduce the elimination of relevant data.