The Relationship Between Agnostic Selective Classification Active Learning and the Disagreement Coefficient

A selective classifier (f,g) comprises a classification function f and a binary selection function g, which determines if the classifier abstains from prediction, or uses f to predict. The classifier is called pointwise-competitive if it classifies each point identically to the best classifier in hindsight (from the same class), whenever it does not abstain. The quality of such a classifier is quantified by its rejection mass, defined to be the probability mass of the points it rejects. A "fast" rejection rate is achieved if the rejection mass is bounded from above by O(1/m) where m is the number of labeled examples used to train the classifier (and O hides logarithmic factors). Pointwise-competitive selective (PCS) classifiers are intimately related to disagreement-based active learning and it is known that in the realizable case, a fast rejection rate of a known PCS algorithm (called Consistent Selective Strategy) is equivalent to an exponential speedup of the well-known CAL active algorithm. We focus on the agnostic setting, for which there is a known algorithm called LESS that learns a PCS classifier and achieves a fast rejection rate (depending on Hanneke's disagreement coefficient) under strong assumptions. We present an improved PCS learning algorithm called ILESS for which we show a fast rate (depending on Hanneke's disagreement coefficient) without any assumptions. Our rejection bound smoothly interpolates the realizable and agnostic settings. The main result of this paper is an equivalence between the following three entities: (i) the existence of a fast rejection rate for any PCS learning algorithm (such as ILESS); (ii) a poly-logarithmic bound for Hanneke's disagreement coefficient; and (iii) an exponential speedup for a new disagreement-based active learner called ActiveiLESS.

[1]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[2]  Kamalika Chaudhuri,et al.  Beyond Disagreement-Based Agnostic Active Learning , 2014, NIPS.

[3]  Ran El-Yaniv,et al.  Agnostic Pointwise-Competitive Selective Classification , 2015, J. Artif. Intell. Res..

[4]  Steve Hanneke,et al.  Nonparametric Active Learning , Part 1 : Smooth Regression Functions , 2017 .

[5]  Mehryar Mohri,et al.  Learning with Rejection , 2016, ALT.

[6]  John Langford,et al.  Efficient and Parsimonious Agnostic Active Learning , 2015, NIPS.

[7]  Daniel J. Hsu Algorithms for active learning , 2010 .

[8]  Steve Hanneke,et al.  Activized Learning: Transforming Passive to Active with Improved Label Complexity , 2011, J. Mach. Learn. Res..

[9]  Liu Yang,et al.  Surrogate Losses in Passive and Active Learning , 2012, Electronic Journal of Statistics.

[10]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[11]  Ran El-Yaniv,et al.  A compression technique for analyzing disagreement-based active learning , 2014, J. Mach. Learn. Res..

[12]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[13]  Ran El-Yaniv,et al.  On the Version Space Compression Set Size and Its Applications , 2015 .

[14]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[15]  Ran El-Yaniv,et al.  Active Learning via Perfect Selective Classification , 2012, J. Mach. Learn. Res..

[16]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[17]  Nir Ailon,et al.  Active Learning Using Smooth Relative Regret Approximations with Applications , 2011, COLT.

[18]  Liu Yang,et al.  Minimax Analysis of Active Learning , 2014, J. Mach. Learn. Res..

[19]  Ming Yuan,et al.  Classification Methods with Reject Option Based on Convex Risk Minimization , 2010, J. Mach. Learn. Res..

[20]  Shai Shalev-Shwartz,et al.  Efficient active learning of halfspaces: an aggressive approach , 2012, J. Mach. Learn. Res..

[21]  Alexandra Carpentier,et al.  An Adaptive Strategy for Active Learning with Smooth Decision Boundary , 2017, ALT.

[22]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[23]  Steve Hanneke,et al.  Theory of Disagreement-Based Active Learning , 2014, Found. Trends Mach. Learn..

[24]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[25]  Steve Hanneke,et al.  Refined Error Bounds for Several Learning Algorithms , 2015, J. Mach. Learn. Res..

[26]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[27]  Matti Kääriäinen,et al.  Active Learning in the Non-realizable Case , 2006, ALT.

[28]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[29]  Ran El-Yaniv,et al.  Agnostic Selective Classification , 2011, NIPS.

[30]  Peter L. Bartlett,et al.  Local Complexities for Empirical Risk Minimization , 2004, COLT.

[31]  Mehryar Mohri,et al.  Boosting with Abstention , 2016, NIPS.

[32]  Stanislav Minsker,et al.  Plug-in Approach to Active Learning , 2011, J. Mach. Learn. Res..

[33]  Ran El-Yaniv,et al.  On the Foundations of Noise-free Selective Classification , 2010, J. Mach. Learn. Res..

[34]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[35]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[36]  Ran El-Yaniv,et al.  Pointwise Tracking the Optimal Regression Function , 2012, NIPS.

[37]  Dennis Shasha,et al.  Conjugate Conformal Prediction for Online Binary Classification , 2016, UAI.