Active Learning for Classification With Abstention

We construct and analyze active learning algorithms for the problem of binary classification with abstention, in which the learner has an additional option to withhold its decision on certain points in the input space. We consider this problem in the <italic>fixed-cost</italic> setting, where the learner incurs a cost <inline-formula> <tex-math notation="LaTeX">$\lambda \in (0, 1/2)$ </tex-math></inline-formula> every time the abstain option is invoked. Our proposed algorithm can work with the three most commonly used active learning query models, namely, <italic>membership-query</italic>, <italic>pool-based</italic>, and <italic>stream-based</italic> models. We obtain a high probability upper-bound on the excess risk of our algorithm, and establish its minimax near-optimality by deriving matching lower-bound (modulo polylogarithmic factors). Since our algorithm relies on the knowledge of the smoothness parameters of the regression function, we also describe a new strategy to adapt to these unknown parameters in a data-driven manner under an additional <italic>quality</italic> assumption. We show that using this strategy our algorithm achieves the same performance in terms of excess risk as their counterparts with the knowledge of the smoothness parameters. We end the paper with a discussion about the extension of our results to the setting of <italic>bounded rate</italic> of abstention.

[1]  Yang Feng,et al.  A survey on Neyman‐Pearson classification and suggestions for future research , 2016 .

[2]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[3]  Yves Grandvalet,et al.  Support Vector Machines with a Reject Option , 2008, NIPS.

[4]  Tara Javidi,et al.  Active Learning for Binary Classification with Abstention , 2019, ArXiv.

[5]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[6]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[7]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2007, 0708.2321.

[8]  A. Tsybakov On nonparametric estimation of density level sets , 1997 .

[9]  Stanislav Minsker,et al.  Plug-in Approach to Active Learning , 2011, J. Mach. Learn. Res..

[10]  M. Wegkamp Lasso type classifiers with a reject option , 2007, 0705.2363.

[11]  Pietro Rubegni,et al.  Automated diagnosis of pigmented skin lesions , 2002, International journal of cancer.

[12]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[13]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[14]  Mohamed Hebiri,et al.  Consistency of plug-in confidence sets for classification in semi-supervised learning , 2015, Journal of Nonparametric Statistics.

[15]  Radu Herbei,et al.  Classification with reject option , 2006 .

[16]  C. K. Chow,et al.  An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[17]  L. Cavalier Nonparametric Estimation of Regression Level Sets , 1997 .

[18]  Ming Yuan,et al.  Classification Methods with Reject Option Based on Convex Risk Minimization , 2010, J. Mach. Learn. Res..

[19]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[20]  Xin Tong,et al.  A plug-in approach to neyman-pearson classification , 2013, J. Mach. Learn. Res..

[21]  Arkadi Nemirovski,et al.  Topics in Non-Parametric Statistics , 2000 .

[22]  Rémi Munos,et al.  From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..

[23]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[24]  R. Handel Probability in High Dimension , 2014 .

[25]  Aurélien Garivier,et al.  Explore First, Exploit Next: The True Shape of Regret in Bandit Problems , 2016, Math. Oper. Res..

[26]  Tadeusz Pietraszek,et al.  Optimizing abstaining classifiers using ROC analysis , 2005, ICML.

[27]  Alexandra Carpentier,et al.  Adaptivity to Noise Parameters in Nonparametric Active Learning , 2017, COLT.

[28]  Adam D. Bull,et al.  Adaptive-treed bandits , 2013, 1302.2489.

[29]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[30]  Eli Upfal,et al.  Bandits and Experts in Metric Spaces , 2013, J. ACM.

[31]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[32]  Aleksandrs Slivkins,et al.  Multi-armed bandits on implicit metric spaces , 2011, NIPS.

[33]  Mehryar Mohri,et al.  Learning with Rejection , 2016, ALT.