论文信息 - Learning from Imbalanced Data in Presence of Noisy and Borderline Examples

Learning from Imbalanced Data in Presence of Noisy and Borderline Examples

In this paper we studied re-sampling methods for learning classifiers from imbalanced data. We carried out a series of experiments on artificial data sets to explore the impact of noisy and borderline examples from the minority class on the classifier performance. Results showed that if data was sufficiently disturbed by these factors, then the focused re-sampling methods - NCR and our SPIDER2 - strongly outperformed the oversampling methods. They were also better for real-life data, where PCA visualizations suggested possible existence of noisy examples and large overlapping ares between classes.

[1] Haibo He,et al. Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2] José Francisco Martínez-Trinidad,et al. Progress in Pattern Recognition, Image Analysis and Applications, 12th Iberoamericann Congress on Pattern Recognition, CIARP 2007, Valparaiso, Chile, November 13-16, 2007, Proceedings , 2008, CIARP.

[3] Taeho Jo,et al. Class imbalances versus small disjuncts , 2004, SKDD.

[4] J. Stefanowski,et al. Improving Rule-Based Classifiers Induced by MODLEM by Selective Pre-processing of Imbalanced Data , 2007 .

[5] José Salvador Sánchez,et al. An Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets , 2007, CIARP.

[6] Szymon Wilk,et al. Selective Pre-processing of Imbalanced Data for Improving Classification Performance , 2008, DaWaK.

[7] Lior Rokach,et al. Data Mining And Knowledge Discovery Handbook , 2005 .

[8] Nitesh V. Chawla,et al. Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[9] Stan Matwin,et al. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[10] Jorma Laurikkala,et al. Improving Identification of Difficult Small Classes by Balancing Class Distribution , 2001, AIME.