论文信息 - Set-valued classification - overview via a unified framework

Set-valued classification - overview via a unified framework

Multi-class classification problem is among the most popular and well-studied statistical frameworks. Modern multi-class datasets can be extremely ambiguous and single-output predictions fail to deliver satisfactory performance. By allowing predictors to predict a set of label candidates, set-valued classification offers a natural way to deal with this ambiguity. Several formulations of set-valued classification are available in the literature and each of them leads to different prediction strategies. The present survey aims to review popular formulations using a unified statistical framework. The proposed framework encompasses previously considered and leads to new formulations as well as it allows to understand underlying trade-offs of each formulation. We provide infinite sample optimal set-valued classification strategies and review a general plug-in principle to construct data-driven algorithms. The exposition is supported by examples and pointers to both theoretical and practical contributions. Finally, we provide experiments on real-world datasets comparing these approaches in practice and providing general practical guidelines.

[1] Chong Zhang,et al. On Reject and Refine Options in Multicategory Classification , 2017, 1701.02265.

[2] Bernt Schiele,et al. Top-k Multiclass SVM , 2015, NIPS.

[3] C. K. Chow,et al. An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[4] Pietro Perona,et al. Presence-Only Geographical Priors for Fine-Grained Image Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5] Jing Lei. Classification with confidence , 2014 .

[6] Zongming Ma,et al. Adaptive Confidence Bands for Nonparametric Regression Functions , 2014, Journal of the American Statistical Association.

[7] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.

[8] Charles Elkan,et al. The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[9] Nontawat Charoenphakdee,et al. On the Calibration of Multiclass Classification with Rejection , 2019, NeurIPS.

[10] Luc Lamontagne,et al. Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence , 2006 .

[11] J. Andrew Royle,et al. Likelihood analysis of species occurrence probability from presence‐only data for modelling species distributions , 2012, Methods in Ecology and Evolution.

[12] Thien M. Ha,et al. The Optimum Class-Selective Rejection Rule , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[13] P. Lambin,et al. Decision support systems for personalized and participative radiation oncology☆ , 2017, Advanced drug delivery reviews.

[14] Alexis Joly,et al. LifeCLEF Plant Identification Task 2014 , 2014, CLEF.

[15] W. Gasarch,et al. The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[16] Larry Wasserman,et al. Distribution‐free prediction bands for non‐parametric regression , 2014 .

[17] Dimitrios I. Fotiadis,et al. Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[18] Alexander Gammerman,et al. Criteria of efficiency for set-valued classification , 2017, Annals of Mathematics and Artificial Intelligence.

[19] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Christophe Denis,et al. Confidence Sets with Expected Sizes for Multiclass Classification , 2016, J. Mach. Learn. Res..

[22] Hoel Le Capitaine. A unified view of class-selection with probabilistic classifiers , 2014, Pattern Recognit..

[23] L. Györfi,et al. Nearest neighbor based conformal prediction , 2020 .

[24] Evgenii Chzhen,et al. Minimax semi-supervised set-valued approach to multi-class classification , 2021 .

[25] Mohamed Hebiri,et al. Consistency of plug-in confidence sets for classification in semi-supervised learning , 2015, Journal of Nonparametric Statistics.

[26] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .

[27] Radu Herbei,et al. Classification with reject option , 2006 .

[28] Vladimir Vovk,et al. Conditional validity of inductive conformal predictors , 2012, Machine Learning.

[29] E. Grycko. Classification with Set-Valued Decision Functions , 1993 .

[30] Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[31] Eyke Hüllermeier,et al. On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[32] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[33] Pierre Bonnet,et al. Categorizing plant images at the variety level: Did you say fine-grained? , 2016, Pattern Recognit. Lett..

[34] Min-Ling Zhang,et al. A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[35] Sebastian Fedden,et al. Extreme classification , 2018, Cognitive Linguistics.

[36] Oluwasanmi Koyejo,et al. On the Consistency of Top-k Surrogate Losses , 2019, ICML.

[37] Peter D. Turney. Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[38] Charles X. Ling,et al. AUC: A Better Measure than Accuracy in Comparing Learning Algorithms , 2003, Canadian Conference on AI.

[39] Evgenii Chzhen,et al. Minimax semi-supervised confidence sets for multi-class classification , 2019, 1904.12527.

[40] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[41] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42] Larry A. Wasserman,et al. Least Ambiguous Set-Valued Classifiers With Bounded Error Levels , 2016, Journal of the American Statistical Association.

[43] Ambuj Tewari,et al. Consistent algorithms for multiclass classification with an abstain option , 2018 .

[44] Maximilien Servajean,et al. A Comparative Study of Fine-grained Classification Methods in the Context of the LifeCLEF Plant Identification Challenge 2015 , 2015, CLEF.

[45] Qiang Wu,et al. A Novel Classification-Rejection Sphere SVMs for Multi-class Classification Problems , 2007, Third International Conference on Natural Computation (ICNC 2007).

[46] Andrew Zisserman,et al. Smooth Loss Functions for Deep Top-k Classification , 2018, ICLR.

[47] Paul Embrechts,et al. A note on generalized inverses , 2013, Math. Methods Oper. Res..

[48] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Alessandro Rinaldo,et al. Distribution-Free Predictive Inference for Regression , 2016, Journal of the American Statistical Association.

[50] Thien M. Ha,et al. Optimum tradeoff between class-selective rejection error and average number of classes , 1997 .

[51] Bernt Schiele,et al. Loss Functions for Top-k Error: Analysis and Insights , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Thien M. Ha,et al. An optimum class-selective rejection rule for pattern recognition , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[53] Trevor Hastie,et al. Inference from presence-only data; the ongoing controversy. , 2013, Ecography.