论文信息 - F-measure Maximizing Logistic Regression

F-measure Maximizing Logistic Regression

Logistic regression is a widely used method in several fields. When applying logistic regression to imbalanced data, for which majority classes dominate over minority classes, all class labels are estimated as `majority class.' In this article, we use an F-measure optimization method to improve the performance of logistic regression applied to imbalanced data. While many F-measure optimization methods adopt a ratio of the estimators to approximate the F-measure, the ratio of the estimators tends to have more bias than when the ratio is directly approximated. Therefore, we employ an approximate F-measure for estimating the relative density ratio. In addition, we define a relative F-measure and approximate the relative F-measure. We show an algorithm for a logistic regression weighted approximated relative to the F-measure. The experimental results using real world data demonstrated that our proposed method is an efficient algorithm to improve the performance of logistic regression applied to imbalanced data.

Hiroshi Yadohisa | Jun Tsuchida | Masaaki Okabe

[1] Feng Jiang,et al. Regularized F-Measure Maximization for Feature Selection and Classification , 2009, Journal of biomedicine & biotechnology.

[2] Vipin Kumar,et al. Optimizing F-Measure with Support Vector Machines , 2003, FLAIRS Conference.

[3] Björn E. Ottersten,et al. Example-Dependent Cost-Sensitive Logistic Regression for Credit Scoring , 2014, 2014 13th International Conference on Machine Learning and Applications.

[4] Sugiyama Masashi,et al. Relative Density-Ratio Estimation for Robust Distribution Comparison , 2011 .

[5] Andrew P. Bradley,et al. The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[6] Bernhard Schölkopf,et al. Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[7] C. J. van Rijsbergen,et al. Information Retrieval , 1979, Encyclopedia of GIS.

[8] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[9] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[10] M. Narasimha Murty,et al. Optimizing F-measure with non-convex loss and sparse linear classifiers , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[11] Stan Szpakowicz,et al. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[12] Nan Ye,et al. Optimizing F-measure: A Tale of Two Approaches , 2012, ICML.

[13] Martin Jansche,et al. Maximum Expected F-Measure Training of Logistic Regression Models , 2005, HLT.