Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization

We compare the plug-in rule approach for optimizing the Fβ-measure in multilabel classification with an approach based on structured loss minimization, such as the structured support vector machine (SSVM). Whereas the former derives an optimal prediction from a probabilistic model in a separate inference step, the latter seeks to optimize the Fβ-measure directly during the training phase. We introduce a novel plug-in rule algorithm that estimates all parameters required for a Bayes-optimal prediction via a set of multinomial regression models, and we compare this algorithm with SSVMs in terms of computational complexity and statistical consistency. As a main theoretical result, we show that our plug-in rule algorithm is consistent, whereas the SSVM approaches are not. Finally, we present results of a large experimental study showing the benefits of the introduced algorithm.

[1]  Martin Jansche,et al.  A Maximum Expected Utility Framework for Binary Sequence Labeling , 2007, ACL.

[2]  Zhi-Hua Zhou,et al.  On the Consistency of Multi-Label Learning , 2011, COLT.

[3]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[4]  Eyke Hüllermeier,et al.  An Exact Algorithm for F-Measure Maximization , 2011, NIPS.

[5]  Nan Ye,et al.  Optimizing F-measure: A Tale of Two Approaches , 2012, ICML.

[6]  José Ramón Quevedo,et al.  Multilabel classifiers with a probabilistic thresholding strategy , 2012, Pattern Recognit..

[7]  Tibério S. Caetano,et al.  Submodular Multi-Label Learning , 2011, NIPS.

[8]  Tibério S. Caetano,et al.  Reverse Multi-Label Learning , 2010, NIPS.

[9]  Kian Ming Adam Chai,et al.  Expectation of f-measures: tractable exact computation and some empirical observations of its properties , 2005, SIGIR '05.

[10]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[11]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[12]  David D. Lewis,et al.  Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.

[13]  S. V. N. Vishwanathan,et al.  Efficient max-margin multi-label classification with applications to zero-shot learning , 2012, Machine Learning.

[14]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .