论文信息 - Research based on unbalanced data set classification algorithm

Research based on unbalanced data set classification algorithm

Many studies have shown that when classifying unbalanced data, traditional classifiers tend to bias to majority classes, which leads to the erroneous judgement of minority class samples to majority classes. In this paper, a new fusion-based equal-proportion sampling method is proposed, taking fully into account the clustering effect of majority class sample data sets. From the majority of samples, the clusters that have been formed are sampled at random from each cluster in an equal proportion form a new data set with minority class. Finally, the classical KNN algorithm is used to classify. Experiments show that using this method, we can find the appropriate number of clusters from majority class of samples and merge with minority class, making the classification accuracy higher.

Peng Wang | Xiaojian Liu | Junming Li

[1] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[2] Feilong Cao,et al. A study on effectiveness of extreme learning machine , 2011, Neurocomputing.

[3] Charles Elkan,et al. The Foundations of Cost-Sensitive Learning , 2001, IJCAI.