论文信息 - Optimizing F-Measures by Cost-Sensitive Classification

Optimizing F-Measures by Cost-Sensitive Classification

We present a theoretical analysis of F-measures for binary, multiclass and multilabel classification. These performance measures are non-linear, but in many scenarios they are pseudo-linear functions of the per-class false negative/false positive rate. Based on this observation, we present a general reduction of F-measure maximization to cost-sensitive classification with unknown costs. We then propose an algorithm with provable guarantees to obtain an approximately optimal classifier for the F-measure by solving a series of cost-sensitive classification problems. The strength of our analysis is to be valid on any dataset and any class of classifiers, extending the existing theoretical results on F-measures, which are asymptotic in nature. We present numerical experiments to illustrate the relative importance of cost asymmetry and thresholding when learning linear classifiers on various F-measure optimization tasks.

[1] Willi Hock,et al. Lecture Notes in Economics and Mathematical Systems , 1981 .

[2] Charles Elkan,et al. The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[3] Vipin Kumar,et al. Optimizing F-Measure with Support Vector Machines , 2003, FLAIRS Conference.

[4] John Langford,et al. An iterative method for multi-class cost-sensitive learning , 2004, KDD.

[5] Samy Bengio,et al. A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification , 2005, NIPS.

[6] Thorsten Joachims,et al. A support vector method for multivariate performance measures , 2005, ICML.

[7] Eric Horvitz,et al. Considering Cost Asymmetry in Learning Classifiers , 2006, J. Mach. Learn. Res..

[8] Ingo Steinwart. How to Compare Different Loss Functions and Their Risks , 2007 .

[9] Grigorios Tsoumakas,et al. Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[10] Chih-Jen Lin,et al. A Study on Threshold Selection for Multi-label Classification , 2007 .

[11] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[12] A. Cambini,et al. Generalized Convexity and Optimization , 2009 .

[13] Tibério S. Caetano,et al. Reverse Multi-Label Learning , 2010, NIPS.

[14] Zhi-Hua Zhou,et al. ON MULTI‐CLASS COST‐SENSITIVE LEARNING , 2006, Comput. Intell..

[15] Eyke Hüllermeier,et al. An Exact Algorithm for F-Measure Maximization , 2011, NIPS.

[16] Tibério S. Caetano,et al. Submodular Multi-Label Learning , 2011, NIPS.

[17] Nan Ye,et al. Optimizing F-measure: A Tale of Two Approaches , 2012, ICML.

[18] Eyke Hüllermeier,et al. F-Measure Maximization in Topical Classification , 2012, RSCTC.

[19] Fabio Roli,et al. F-measure optimisation in multi-label classifiers , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[20] C. Scott. Calibrated asymmetric surrogate losses , 2012 .

[21] Fabio Roli,et al. Threshold optimisation for multi-label classifiers , 2013, Pattern Recognit..

[22] Eyke Hüllermeier,et al. Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization , 2013, ICML.

[23] Yue Wang,et al. The Genia Event Extraction Shared Task, 2013 Edition - Overview , 2013, BioNLP@ACL.

[24] Charles Elkan,et al. Optimal Thresholding of Classifiers to Maximize F1 Measure , 2014, ECML/PKDD.