Set-valued classification - overview via a unified framework

Multi-class classification problem is among the most popular and well-studied statistical frameworks. Modern multi-class datasets can be extremely ambiguous and single-output predictions fail to deliver satisfactory performance. By allowing predictors to predict a set of label candidates, set-valued classification offers a natural way to deal with this ambiguity. Several formulations of set-valued classification are available in the literature and each of them leads to different prediction strategies. The present survey aims to review popular formulations using a unified statistical framework. The proposed framework encompasses previously considered and leads to new formulations as well as it allows to understand underlying trade-offs of each formulation. We provide infinite sample optimal set-valued classification strategies and review a general plug-in principle to construct data-driven algorithms. The exposition is supported by examples and pointers to both theoretical and practical contributions. Finally, we provide experiments on real-world datasets comparing these approaches in practice and providing general practical guidelines.

[1]  Chong Zhang,et al.  On Reject and Refine Options in Multicategory Classification , 2017, 1701.02265.

[2]  Bernt Schiele,et al.  Top-k Multiclass SVM , 2015, NIPS.

[3]  C. K. Chow,et al.  An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[4]  Pietro Perona,et al.  Presence-Only Geographical Priors for Fine-Grained Image Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Jing Lei Classification with confidence , 2014 .

[6]  Zongming Ma,et al.  Adaptive Confidence Bands for Nonparametric Regression Functions , 2014, Journal of the American Statistical Association.

[7]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[8]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[9]  Nontawat Charoenphakdee,et al.  On the Calibration of Multiclass Classification with Rejection , 2019, NeurIPS.

[10]  Luc Lamontagne,et al.  Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence , 2006 .

[11]  J. Andrew Royle,et al.  Likelihood analysis of species occurrence probability from presence‐only data for modelling species distributions , 2012, Methods in Ecology and Evolution.

[12]  Thien M. Ha,et al.  The Optimum Class-Selective Rejection Rule , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  P. Lambin,et al.  Decision support systems for personalized and participative radiation oncology☆ , 2017, Advanced drug delivery reviews.

[14]  Alexis Joly,et al.  LifeCLEF Plant Identification Task 2014 , 2014, CLEF.

[15]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[16]  Larry Wasserman,et al.  Distribution‐free prediction bands for non‐parametric regression , 2014 .

[17]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[18]  Alexander Gammerman,et al.  Criteria of efficiency for set-valued classification , 2017, Annals of Mathematics and Artificial Intelligence.

[19]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Christophe Denis,et al.  Confidence Sets with Expected Sizes for Multiclass Classification , 2016, J. Mach. Learn. Res..

[22]  Hoel Le Capitaine A unified view of class-selection with probabilistic classifiers , 2014, Pattern Recognit..

[23]  L. Györfi,et al.  Nearest neighbor based conformal prediction , 2020 .

[24]  Evgenii Chzhen,et al.  Minimax semi-supervised set-valued approach to multi-class classification , 2021 .

[25]  Mohamed Hebiri,et al.  Consistency of plug-in confidence sets for classification in semi-supervised learning , 2015, Journal of Nonparametric Statistics.

[26]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[27]  Radu Herbei,et al.  Classification with reject option , 2006 .

[28]  Vladimir Vovk,et al.  Conditional validity of inductive conformal predictors , 2012, Machine Learning.

[29]  E. Grycko Classification with Set-Valued Decision Functions , 1993 .

[30]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[31]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[32]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[33]  Pierre Bonnet,et al.  Categorizing plant images at the variety level: Did you say fine-grained? , 2016, Pattern Recognit. Lett..

[34]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[35]  Sebastian Fedden,et al.  Extreme classification , 2018, Cognitive Linguistics.

[36]  Oluwasanmi Koyejo,et al.  On the Consistency of Top-k Surrogate Losses , 2019, ICML.

[37]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[38]  Charles X. Ling,et al.  AUC: A Better Measure than Accuracy in Comparing Learning Algorithms , 2003, Canadian Conference on AI.

[39]  Evgenii Chzhen,et al.  Minimax semi-supervised confidence sets for multi-class classification , 2019, 1904.12527.

[40]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[41]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Larry A. Wasserman,et al.  Least Ambiguous Set-Valued Classifiers With Bounded Error Levels , 2016, Journal of the American Statistical Association.

[43]  Ambuj Tewari,et al.  Consistent algorithms for multiclass classification with an abstain option , 2018 .

[44]  Maximilien Servajean,et al.  A Comparative Study of Fine-grained Classification Methods in the Context of the LifeCLEF Plant Identification Challenge 2015 , 2015, CLEF.

[45]  Qiang Wu,et al.  A Novel Classification-Rejection Sphere SVMs for Multi-class Classification Problems , 2007, Third International Conference on Natural Computation (ICNC 2007).

[46]  Andrew Zisserman,et al.  Smooth Loss Functions for Deep Top-k Classification , 2018, ICLR.

[47]  Paul Embrechts,et al.  A note on generalized inverses , 2013, Math. Methods Oper. Res..

[48]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Alessandro Rinaldo,et al.  Distribution-Free Predictive Inference for Regression , 2016, Journal of the American Statistical Association.

[50]  Thien M. Ha,et al.  Optimum tradeoff between class-selective rejection error and average number of classes , 1997 .

[51]  Bernt Schiele,et al.  Loss Functions for Top-k Error: Analysis and Insights , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Thien M. Ha,et al.  An optimum class-selective rejection rule for pattern recognition , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[53]  Trevor Hastie,et al.  Inference from presence-only data; the ongoing controversy. , 2013, Ecography.