Adaptive Weight Optimization for Classification of Imbalanced Data

One popular approach for imbalance learning is weighting samples in rare classes with high cost and then applying cost-sensitive learning methods to deal with imbalance in classes. Weight of a class is usually determined by proportion of samples in each class in training set. This paper analyzes that sample proportions of training set and testing set may vary in some range and it would compromise performance of learned classifier. This problem becomes serious when class distribution is extremely high imbalanced. Based on the analysis, an adaptive weighting approach aiming at finding a group of proper weights for classes is proposed. We employ evolutionary algorithm to optimize weight configuration to ensure overall performance of classifier in both training set and possible testing sets. Experimental results on a wide variety of datasets demonstrate that our approach could achieve better performances.

[1]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Hsuan-Tien Lin,et al.  A simple methodology for soft cost-sensitive classification , 2012, KDD.

[3]  James Kennedy,et al.  Defining a Standard for Particle Swarm Optimization , 2007, 2007 IEEE Swarm Intelligence Symposium.

[4]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[5]  Andrew Rosenberg,et al.  Classifying Skewed Data: Importance Weighting to Optimize Average Recall , 2012, INTERSPEECH.

[6]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[7]  Francisco Herrera,et al.  Addressing the Classification with Imbalanced Data: Open Problems and New Challenges on Class Distribution , 2011, HAIS.

[8]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[9]  Nuno Vasconcelos,et al.  Risk minimization, probability elicitation, and cost-sensitive SVMs , 2010, ICML.

[10]  Lars Schmidt-Thieme,et al.  Cost-sensitive learning methods for imbalanced data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[11]  Kai Ming Ting,et al.  An Instance-weighting Method to Induce Cost-sensitive Trees , 2001 .

[12]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[13]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).