Class Imbalance Oriented Logistic Regression

Class-imbalance is quite common in real world. For the imbalanced class distribution, traditional state-of-the-art classifiers do not work well on imbalanced data sets. In this paper, we apply logistic regression model to class-imbalance problem, and propose a novel algorithm called CILR (Class Imbalance oriented Logistic Regression) to tackle imbalanced data sets. Unlike traditional logistic regression which tries to optimize MLE (maximum likelihood Estimation) function, CILR optimizes the proposed objective function based on MLE and recall metric in this paper. The loss function takes full use of the characteristic of both majority class and minority class simultaneously, which guarantees that CILR enhances the classification performance of logistic regression on rare class without decreasing accuracy in general. Experimental results on 16 data sets show that CILR performs significantly better than traditional logistic regression, under-sampled logistic regression and over-sampled logistic regression.

[1]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Bianca Zadrozny,et al.  Undersampling Strategy Based on Clustering to Improve the Performance of Splice Site Classification in Human Genes , 2013, 2013 24th International Workshop on Database and Expert Systems Applications.

[3]  Zhi-Hua Zhou,et al.  The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study , 2006, Sixth International Conference on Data Mining (ICDM'06).

[4]  Zhi-Hua Zhou,et al.  Cost-Sensitive Face Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Xin Yao,et al.  Multiclass Imbalance Problems: Analysis and Potential Solutions , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Yu-Hong Dai,et al.  A perfect example for the BFGS method , 2013, Math. Program..

[7]  Zhi-Hua Zhou,et al.  Learning Imbalanced Multi-class Data with Optimal Dichotomy Weights , 2013, 2013 IEEE 13th International Conference on Data Mining.

[8]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[9]  S. Theodoridis,et al.  Chapter 4 – Nonlinear Classifiers , 2009 .

[10]  Joachim Denzler,et al.  One-class classification with Gaussian processes , 2013, Pattern Recognit..