Exponential Savings in Agnostic Active Learning Through Abstention

We show that in pool-based active classification without assumptions on the underlying distribution, if the learner is given the power to abstain from some predictions by paying the price marginally smaller than the average loss 1/2 of a random guess, exponential savings in the number of label requests are possible whenever they are possible in the corresponding realizable problem. We extend this result to provide a necessary and sufficient condition for exponential savings in poolbased active classification under the model misspecification.

[1]  K. Alexander,et al.  Rates of growth and sample moduli for weighted empirical processes indexed by sets , 1987 .

[2]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[3]  Ran El-Yaniv,et al.  On the Foundations of Noise-free Selective Classification , 2010, J. Mach. Learn. Res..

[4]  P. Bartlett,et al.  Empirical minimization , 2006 .

[5]  David Haussler,et al.  Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[6]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[7]  Ran El-Yaniv,et al.  Active Learning via Perfect Selective Classification , 2012, J. Mach. Learn. Res..

[8]  Kamalika Chaudhuri,et al.  Beyond Disagreement-Based Agnostic Active Learning , 2014, NIPS.

[9]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[10]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[11]  Jean-Yves Audibert Fast learning rates in statistical inference through aggregation , 2007, math/0703854.

[12]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[13]  Shahar Mendelson,et al.  An Unrestricted Learning Procedure , 2019, J. ACM.

[14]  Tara Javidi,et al.  Active Learning for Classification with Abstention , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[15]  Alexandra Carpentier,et al.  Adaptivity to Noise Parameters in Nonparametric Active Learning , 2017, COLT.

[16]  Y. Mansour,et al.  Generalization bounds for averaged classifiers , 2004, math/0410092.

[17]  Ran El-Yaniv,et al.  The Relationship Between Agnostic Selective Classification Active Learning and the Disagreement Coefficient , 2017, J. Mach. Learn. Res..

[18]  Alexandra Carpentier,et al.  An Adaptive Strategy for Active Learning with Smooth Decision Boundary , 2017, ALT.

[19]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[20]  Proof of the optimality of the empirical star algorithm , 2010 .

[21]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[22]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[23]  S. Mendelson,et al.  Aggregation via empirical risk minimization , 2009 .

[24]  Jean-Yves Audibert,et al.  Progressive mixture rules are deviation suboptimal , 2007, NIPS.

[25]  Radu Herbei,et al.  Classification with reject option , 2006 .

[26]  Shai Ben-David,et al.  The sample complexity of agnostic learning under deterministic labels , 2014, COLT.

[27]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[28]  Matti Kääriäinen,et al.  Active Learning in the Non-realizable Case , 2006, ALT.

[29]  Olivier Bousquet,et al.  Fast classification rates without standard margin assumptions , 2019, ArXiv.

[30]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[31]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[32]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[33]  Nikita Zhivotovskiy,et al.  Distribution-Free Robust Linear Regression , 2021, Mathematical Statistics and Learning.

[34]  Mehryar Mohri,et al.  Learning with Rejection , 2016, ALT.

[35]  Steve Hanneke,et al.  Localization of VC Classes: Beyond Local Rademacher Complexities , 2016, ALT.

[36]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[37]  Steve Hanneke Rates of convergence in active learning , 2011, 1103.1790.

[38]  A. Korostelev On minimax rates of convergence in image models under sequential design , 1999 .

[39]  V. Koltchinskii,et al.  Concentration inequalities and asymptotic results for ratio type empirical processes , 2006, math/0606788.

[40]  Vladimir Koltchinskii,et al.  Rademacher Complexities and Bounding the Excess Risk in Active Learning , 2010, J. Mach. Learn. Res..

[41]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[42]  Liu Yang,et al.  Minimax Analysis of Active Learning , 2014, J. Mach. Learn. Res..

[43]  Tara Javidi,et al.  Active Learning from Imperfect Labelers , 2016, NIPS.

[44]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[45]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[46]  Stanislav Minsker,et al.  Plug-in Approach to Active Learning , 2011, J. Mach. Learn. Res..

[47]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[48]  Daniel J. Hsu Algorithms for active learning , 2010 .

[49]  Steve Hanneke,et al.  Theory of Disagreement-Based Active Learning , 2014, Found. Trends Mach. Learn..

[50]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[51]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[52]  Liu Yang,et al.  Surrogate Losses in Passive and Active Learning , 2012, Electronic Journal of Statistics.