A diversity-based method for class-imbalanced cost-sensitive learning

It is often the case that datasets are imbalanced in the real world. In this situation, it is minimizing misclassification costs rather than classification accuracy that is the primary goal of classification algorithms. To tackle this problem and improve the performance of classifiers, sampling is widely employed. In this paper, we propose a new diversity-based under-sampling technique for class-imbalanced datasets. The key idea is to balance a data set by choosing only the potential informative samples of the majority class according to diversity of class probability calculation. The experimental results on 5 class-imbalanced datasets show that our method performs better than two existing sampling techniques in terms of total misclassification costs.

[1]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[4]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[5]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[6]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[7]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[8]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[9]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[10]  Man-sun Kim An Effective Under-Sampling Method for Class Imbalance Data Problem , 2007 .

[11]  Jesús Cid-Sueiro,et al.  Local estimation of posterior class probabilities to minimize classification errors , 2004, IEEE Transactions on Neural Networks.

[12]  Yue-Shi Lee,et al.  Cluster-based under-sampling approaches for imbalanced data distributions , 2009, Expert Syst. Appl..

[13]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[15]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Daoud Clarke,et al.  On developing robust models for favourability analysis: Model choice, feature sets and imbalanced data , 2012, Decis. Support Syst..