Optimal Classification with Multivariate Losses

Multivariate loss functions are extensively employed in several prediction tasks arising in Information Retrieval. Often, the goal in the tasks is to minimize expected loss when retrieving relevant items from a presented set of items, where the expectation is with respect to the joint distribution over item sets. Our key result is that for most multivariate losses, the expected loss is provably optimized by sorting the items by the conditional probability of label being positive and then selecting top k items. Such a result was previously known only for the F-measure. Leveraging on the optimality characterization, we give an algorithm for estimating optimal predictions in practice with runtime quadratic in size of item sets for many losses. We provide empirical results on benchmark datasets, comparing the proposed algorithm to state-of-the-art methods for optimizing multivariate losses.

[1]  José Ramón Quevedo,et al.  Multilabel classifiers with a probabilistic thresholding strategy , 2012, Pattern Recognit..

[2]  Mark D. Reid,et al.  Surrogate regret bounds for proper losses , 2009, ICML '09.

[3]  Oluwasanmi Koyejo,et al.  Consistent Multilabel Classification , 2015, NIPS.

[4]  Zhihua Cai,et al.  Evaluation Measures of the Classification Performance of Imbalanced Data Sets , 2009 .

[5]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[6]  David D. Lewis,et al.  Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.

[7]  Oluwasanmi Koyejo,et al.  Consistent Binary Classification with Generalized Performance Metrics , 2014, NIPS.

[8]  Eyke Hüllermeier,et al.  On the bayes-optimality of F-measure maximizers , 2013, J. Mach. Learn. Res..

[9]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[10]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[11]  Harikrishna Narasimhan,et al.  Consistent Multiclass Algorithms for Complex Performance Measures , 2015, ICML.

[12]  Eyke Hüllermeier,et al.  Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization , 2013, ICML.

[13]  Martin Jansche,et al.  A Maximum Expected Utility Framework for Binary Sequence Labeling , 2007, ACL.

[14]  Eyke Hüllermeier,et al.  An Exact Algorithm for F-Measure Maximization , 2011, NIPS.

[15]  Tibério S. Caetano,et al.  Submodular Multi-Label Learning , 2011, NIPS.

[16]  Kian Ming Adam Chai,et al.  Expectation of f-measures: tractable exact computation and some empirical observations of its properties , 2005, SIGIR '05.

[17]  Harikrishna Narasimhan,et al.  On the Statistical Consistency of Plug-in Classifiers for Non-decomposable Performance Measures , 2014, NIPS.

[18]  Robert C. Holte,et al.  Severe Class Imbalance: Why Better Algorithms Aren't the Answer , 2005, ECML.

[19]  Nan Ye,et al.  Optimizing F-measure: A Tale of Two Approaches , 2012, ICML.

[20]  Charles Elkan,et al.  F1-Optimal Thresholding in the Multi-Label Setting , 2014, ArXiv.

[21]  Prateek Jain,et al.  Online and Stochastic Gradient Methods for Non-decomposable Loss Functions , 2014, NIPS.

[22]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[23]  Hong Wang,et al.  Adversarial Prediction Games for Multivariate Losses , 2015, NIPS.

[24]  Juan José del Coz,et al.  Learning Nondeterministic Classifiers , 2009, J. Mach. Learn. Res..

[25]  Yves Grandvalet,et al.  Optimizing F-Measures by Cost-Sensitive Classification , 2014, NIPS.