Noise-Adaptive Margin-Based Active Learning and Lower Bounds under Tsybakov Noise Condition

We present a simple noise-robust margin-based active learning algorithm to find homogeneous (passing the origin) linear separators and analyze its error convergence when labels are corrupted by noise. We show that when the imposed noise satisfies the Tsybakov low noise condition (Mammen, Tsybakov, and others 1999; Tsybakov 2004) the algorithm is able to adapt to unknown level of noise and achieves optimal statistical rate up to poly-logarithmic factors. We also derive lower bounds for margin based active learning algorithms under Tsybakov noise conditions (TNC) for the membership query synthesis scenario (Angluin 1988). Our result implies lower bounds for the stream based selective sampling scenario (Cohn 1990) under TNC for some fairly simple data distributions. Quite surprisingly, we show that the sample complexity cannot be improved even if the underlying data distribution is as simple as the uniform distribution on the unit ball. Our proof involves the construction of a well separated hypothesis set on the d-dimensional unit ball along with carefully designed label distributions for the Tsybakov noise condition. Our analysis might provide insights for other forms of lower bounds as well.

[1]  Steve Hanneke Rates of convergence in active learning , 2011, 1103.1790.

[2]  Martin J. Wainwright,et al.  Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.

[3]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[4]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[5]  Steve Hanneke,et al.  Theory of Disagreement-Based Active Learning , 2014, Found. Trends Mach. Learn..

[6]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[7]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[8]  D. Angluin Queries and Concept Learning , 1988 .

[9]  Aarti Singh,et al.  Optimal rates for stochastic convex optimization under Tsybakov noise condition , 2013, ICML.

[10]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[11]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[12]  Kamalika Chaudhuri,et al.  Beyond Disagreement-Based Agnostic Active Learning , 2014, NIPS.

[13]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[14]  Liu Yang,et al.  Minimax Analysis of Active Learning , 2014, J. Mach. Learn. Res..

[15]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[16]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[17]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[18]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[19]  Aarti Singh,et al.  Optimal rates for first-order stochastic convex optimization under Tsybakov noise condition , 2012, ICML 2013.

[20]  Y. Nesterov,et al.  Primal-dual subgradient methods for minimizing uniformly convex functions , 2010, 1401.1792.

[21]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[22]  Maria-Florina Balcan,et al.  Active and passive learning of linear separators under log-concave distributions , 2012, COLT.

[23]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[24]  R. Nowak,et al.  Upper and Lower Error Bounds for Active Learning , 2006 .

[25]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[26]  Aarti Singh,et al.  Algorithmic Connections between Active Learning and Stochastic Convex Optimization , 2013, ALT.

[27]  N. J. A. Sloane,et al.  Lower bounds for constant weight codes , 1980, IEEE Trans. Inf. Theory.

[28]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[29]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .