Rethinking and Reweighting the Univariate Losses for Multi-Label Ranking: Consistency and Generalization

The (partial) ranking loss is a commonly used evaluation measure for multi-label classification, which is usually optimized with convex surrogates for computational efficiency. Prior theoretical efforts on multi-label ranking mainly focus on (Fisher) consistency analyses. However, there is a gap between existing theory and practice — some inconsistent pairwise losses can lead to promising performance, while some consistent univariate losses usually have no clear superiority in practice. To take a step towards filling up this gap, this paper presents a systematic study from two complementary perspectives of consistency and generalization error bounds of learning algorithms. We theoretically find two key factors of the distribution (or dataset) that affect the learning guarantees of algorithms: the instance-wise class imbalance and the label size c. Specifically, in an extremely imbalanced case, the algorithm with the consistent univariate loss has an error bound of O(c), while the one with the inconsistent pairwise loss depends on O( √ c) as shown in prior work. This may shed light on the superior performance of pairwise methods in practice, where real datasets are usually highly imbalanced. Moreover, we present an inconsistent reweighted univariate loss-based algorithm that enjoys an error bound of O( √ c) for promising performance as well as the computational efficiency of univariate losses. Finally, experimental results confirm our theoretical findings.

[1]  Oluwasanmi Koyejo,et al.  Consistent Multilabel Classification , 2015, NIPS.

[2]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[3]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[4]  Zhi-Hua Zhou,et al.  On the Consistency of Multi-Label Learning , 2011, COLT.

[5]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[6]  Shiqian Ma,et al.  Barzilai-Borwein Step Size for Stochastic Gradient Descent , 2016, NIPS.

[7]  Zhi-Hua Zhou,et al.  A Unified View of Multi-Label Performance Measures , 2016, ICML.

[8]  Ankit Singh Rawat,et al.  Multilabel reductions: what is my loss optimising? , 2019, NeurIPS.

[9]  Miao Xu,et al.  Robust Multi-Label Learning with PRO Loss , 2020, IEEE Transactions on Knowledge and Data Engineering.

[10]  E. Hüllermeier,et al.  Consistent multilabel ranking through univariate loss minimization , 2012, ICML 2012.

[11]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[12]  Shivani Agarwal,et al.  Convex Calibrated Surrogates for the Multi-Label F-Measure , 2020, ICML.

[13]  Guoqiang Wu,et al.  Multi-label classification: do Hamming loss and subset accuracy really conflict with each other? , 2020, NeurIPS.

[14]  Shivani Agarwal,et al.  Bayes Consistency vs. H-Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class , 2020, Neural Information Processing Systems.

[15]  Eyke Hüllermeier,et al.  On the bayes-optimality of F-measure maximizers , 2013, J. Mach. Learn. Res..

[16]  Nan Ye,et al.  Optimizing F-measure: A Tale of Two Approaches , 2012, ICML.

[17]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[18]  Zhi-Hua Zhou,et al.  Multi-label optimal margin distribution machine , 2019, Machine Learning.

[19]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[20]  Oluwasanmi Koyejo,et al.  Consistency Analysis for Binary Classification Revisited , 2017, ICML.

[21]  Sanjeev Arora,et al.  Computational Complexity: A Modern Approach , 2009 .

[22]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[23]  Yoram Singer,et al.  Log-Linear Models for Label Ranking , 2003, NIPS.

[24]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[25]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[26]  Marius Kloft,et al.  Data-Dependent Generalization Bounds for Multi-Class Classification , 2017, IEEE Transactions on Information Theory.

[27]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[28]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yingjie Tian,et al.  Joint Ranking SVM and Binary Relevance with Robust Low-Rank Learning for Multi-Label Classification , 2019, Neural Networks.

[30]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[31]  Volker Tresp,et al.  Multi-label informed latent semantic indexing , 2005, SIGIR '05.

[32]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[33]  Lihi Zelnik-Manor,et al.  Large Scale Max-Margin Multi-Label Classification with Priors , 2010, ICML.

[34]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[35]  Andreas Maurer,et al.  A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.