Optimal Decision-Theoretic Classification Using Non-Decomposable Performance Metrics

We provide a general theoretical analysis of expected out-of-sample utility, also referred to as decision-theoretic classification, for non-decomposable binary classification metrics such as F-measure and Jaccard coefficient. Our key result is that the expected out-of-sample utility for many performance metrics is provably optimized by a classifier which is equivalent to a signed thresholding of the conditional probability of the positive class. Our analysis bridges a gap in the literature on binary classification, revealed in light of recent results for non-decomposable metrics in population utility maximization style classification. Our results identify checkable properties of a performance metric which are sufficient to guarantee a probability ranking principle. We propose consistent estimators for optimal expected out-of-sample classification. As a consequence of the probability ranking principle, computational requirements can be reduced from exponential to cubic complexity in the general case, and further reduced to quadratic complexity in special cases. We provide empirical results on simulated and benchmark datasets evaluating the performance of the proposed algorithms for decision-theoretic classification and comparing them to baseline and state-of-the-art methods in population utility maximization for non-decomposable metrics.

[1]  Oluwasanmi Koyejo,et al.  Consistent Binary Classification with Generalized Performance Metrics , 2014, NIPS.

[2]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[3]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[4]  Ingo Steinwart How to Compare Different Loss Functions and Their Risks , 2007 .

[5]  Nan Ye,et al.  Optimizing F-measure: A Tale of Two Approaches , 2012, ICML.

[6]  Mark D. Reid,et al.  Surrogate regret bounds for proper losses , 2009, ICML '09.

[7]  Kian Ming Adam Chai,et al.  Expectation of f-measures: tractable exact computation and some empirical observations of its properties , 2005, SIGIR '05.

[8]  Harikrishna Narasimhan,et al.  On the Statistical Consistency of Plug-in Classifiers for Non-decomposable Performance Measures , 2014, NIPS.

[9]  Robert C. Holte,et al.  Severe Class Imbalance: Why Better Algorithms Aren't the Answer , 2005, ECML.

[10]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[11]  Eyke Hüllermeier,et al.  An Exact Algorithm for F-Measure Maximization , 2011, NIPS.

[12]  David D. Lewis,et al.  Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.

[13]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[14]  Martin Jansche,et al.  Maximum Expected F-Measure Training of Logistic Regression Models , 2005, HLT.

[15]  Zhihua Cai,et al.  Evaluation Measures of the Classification Performance of Imbalanced Data Sets , 2009 .

[16]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[17]  Yves Grandvalet,et al.  Optimizing F-Measures by Cost-Sensitive Classification , 2014, NIPS.