F-measure Maximizing Logistic Regression

Logistic regression is a widely used method in several fields. When applying logistic regression to imbalanced data, for which majority classes dominate over minority classes, all class labels are estimated as `majority class.' In this article, we use an F-measure optimization method to improve the performance of logistic regression applied to imbalanced data. While many F-measure optimization methods adopt a ratio of the estimators to approximate the F-measure, the ratio of the estimators tends to have more bias than when the ratio is directly approximated. Therefore, we employ an approximate F-measure for estimating the relative density ratio. In addition, we define a relative F-measure and approximate the relative F-measure. We show an algorithm for a logistic regression weighted approximated relative to the F-measure. The experimental results using real world data demonstrated that our proposed method is an efficient algorithm to improve the performance of logistic regression applied to imbalanced data.

[1]  Feng Jiang,et al.  Regularized F-Measure Maximization for Feature Selection and Classification , 2009, Journal of biomedicine & biotechnology.

[2]  Vipin Kumar,et al.  Optimizing F-Measure with Support Vector Machines , 2003, FLAIRS Conference.

[3]  Björn E. Ottersten,et al.  Example-Dependent Cost-Sensitive Logistic Regression for Credit Scoring , 2014, 2014 13th International Conference on Machine Learning and Applications.

[4]  Sugiyama Masashi,et al.  Relative Density-Ratio Estimation for Robust Distribution Comparison , 2011 .

[5]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[6]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[7]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[8]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[9]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[10]  M. Narasimha Murty,et al.  Optimizing F-measure with non-convex loss and sparse linear classifiers , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[11]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[12]  Nan Ye,et al.  Optimizing F-measure: A Tale of Two Approaches , 2012, ICML.

[13]  Martin Jansche,et al.  Maximum Expected F-Measure Training of Logistic Regression Models , 2005, HLT.