Fuzzy-rough k-nearest neighbor algorithm for imbalanced data sets learning

Learning from imbalanced data sets presents a new challenge to machine learning community, as traditional methods are biased to majority classes and produce poor detection rate of minority classes. This paper presents a new approach, namely fuzzy-rough k-nearest neighbor algorithm for imbalanced data sets learning to improve the classification performance of minority class. The approach defines fuzzy membership function that is in favor of minority class and constructs fuzzy equivalent relation between the unlabeled instance and its k nearest neighbors. The approach takes the fuzziness and roughness of the nearest neighbors of an instance into consideration, and can reduce the disturbance of majority class to minority class. Experiments show that our new approach improves not only the classification performance of minority class more effectively, but also the classification performance of the whole data set comparing with other methods.