论文信息 - Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization

Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization

We compare the plug-in rule approach for optimizing the Fβ-measure in multilabel classification with an approach based on structured loss minimization, such as the structured support vector machine (SSVM). Whereas the former derives an optimal prediction from a probabilistic model in a separate inference step, the latter seeks to optimize the Fβ-measure directly during the training phase. We introduce a novel plug-in rule algorithm that estimates all parameters required for a Bayes-optimal prediction via a set of multinomial regression models, and we compare this algorithm with SSVMs in terms of computational complexity and statistical consistency. As a main theoretical result, we show that our plug-in rule algorithm is consistent, whereas the SSVM approaches are not. Finally, we present results of a large experimental study showing the benefits of the introduced algorithm.

[1] Martin Jansche,et al. A Maximum Expected Utility Framework for Binary Sequence Labeling , 2007, ACL.

[2] Zhi-Hua Zhou,et al. On the Consistency of Multi-Label Learning , 2011, COLT.

[3] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .

[4] Eyke Hüllermeier,et al. An Exact Algorithm for F-Measure Maximization , 2011, NIPS.

[5] Nan Ye,et al. Optimizing F-measure: A Tale of Two Approaches , 2012, ICML.

[6] José Ramón Quevedo,et al. Multilabel classifiers with a probabilistic thresholding strategy , 2012, Pattern Recognit..

[7] Tibério S. Caetano,et al. Submodular Multi-Label Learning , 2011, NIPS.

[8] Tibério S. Caetano,et al. Reverse Multi-Label Learning , 2010, NIPS.

[9] Kian Ming Adam Chai,et al. Expectation of f-measures: tractable exact computation and some empirical observations of its properties , 2005, SIGIR '05.

[10] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[11] J. E. Kelley,et al. The Cutting-Plane Method for Solving Convex Programs , 1960 .

[12] David D. Lewis,et al. Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.

[13] S. V. N. Vishwanathan,et al. Efficient max-margin multi-label classification with applications to zero-shot learning , 2012, Machine Learning.

[14] Ambuj Tewari,et al. On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[15] Thomas M. Cover,et al. Elements of Information Theory , 2005 .