Revisiting Perceptron: Efficient and Label-Optimal Learning of Halfspaces

It has been a long-standing problem to efficiently learn a halfspace using as few labels as possible in the presence of noise. In this work, we propose an efficient Perceptron-based algorithm for actively learning homogeneous halfspaces under the uniform distribution over the unit sphere. Under the bounded noise condition~\cite{MN06}, where each label is flipped with probability at most $\eta < \frac 1 2$, our algorithm achieves a near-optimal label complexity of $\tilde{O}\left(\frac{d}{(1-2\eta)^2}\ln\frac{1}{\epsilon}\right)$ in time $\tilde{O}\left(\frac{d^2}{\epsilon(1-2\eta)^3}\right)$. Under the adversarial noise condition~\cite{ABL14, KLS09, KKMS08}, where at most a $\tilde \Omega(\epsilon)$ fraction of labels can be flipped, our algorithm achieves a near-optimal label complexity of $\tilde{O}\left(d\ln\frac{1}{\epsilon}\right)$ in time $\tilde{O}\left(\frac{d^2}{\epsilon}\right)$. Furthermore, we show that our active learning algorithm can be converted to an efficient passive learning algorithm that has near-optimal sample complexities with respect to $\epsilon$ and $d$.

[1]  Steve Hanneke Rates of convergence in active learning , 2011, 1103.1790.

[2]  Vladimir Koltchinskii,et al.  Rademacher Complexities and Bounding the Excess Risk in Active Learning , 2010, J. Mach. Learn. Res..

[3]  Steve Hanneke,et al.  Theoretical foundations of active learning , 2009 .

[4]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[5]  John N. Tsitsiklis,et al.  Active Learning Using Arbitrary Binary Valued Queries , 1993, Machine Learning.

[6]  Francesco Orabona,et al.  Better Algorithms for Selective Sampling , 2011, ICML.

[7]  S. Agmon The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[8]  Maria-Florina Balcan,et al.  Sample and Computationally Efficient Learning Algorithms under S-Concave Distributions , 2017, NIPS.

[9]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[10]  Maria-Florina Balcan,et al.  Efficient Learning of Linear Separators under Bounded Noise , 2015, COLT.

[11]  Aarti Singh,et al.  Noise-Adaptive Margin-Based Active Learning and Lower Bounds under Tsybakov Noise Condition , 2014, AAAI.

[12]  Amin Karbasi,et al.  Near-Optimal Active Learning of Halfspaces via Query Synthesis in the Noisy Setting , 2016, AAAI.

[13]  Maria-Florina Balcan,et al.  Active and passive learning of linear separators under log-concave distributions , 2012, COLT.

[14]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[15]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[16]  Rocco A. Servedio,et al.  Learning Halfspaces with Malicious Noise , 2009, ICALP.

[17]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[18]  Maxim Raginsky,et al.  Lower Bounds for Passive and Active Learning , 2011, NIPS.

[19]  Vitaly Feldman,et al.  New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[20]  Maria-Florina Balcan,et al.  The true sample complexity of active learning , 2010, Machine Learning.

[21]  Maria-Florina Balcan,et al.  Statistical Active Learning Algorithms , 2013, NIPS.

[22]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[23]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[24]  Dana Angluin,et al.  Learning from noisy examples , 1988, Machine Learning.

[25]  Pravesh Kothari,et al.  Embedding Hard Learning Problems into Gaussian Space , 2014, Electron. Colloquium Comput. Complex..

[26]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[27]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[28]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[29]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[30]  Liwei Wang,et al.  Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning , 2011, J. Mach. Learn. Res..

[31]  Claudio Gentile,et al.  Selective sampling and active learning from single and multiple teachers , 2012, J. Mach. Learn. Res..

[32]  Jacques Stern,et al.  The hardness of approximate optima in lattices, codes, and systems of linear equations , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[33]  Liu Yang,et al.  Surrogate Losses in Passive and Active Learning , 2012, Electronic Journal of Statistics.

[34]  Sanjoy Dasgupta,et al.  Two faces of active learning , 2011, Theor. Comput. Sci..

[35]  Santosh S. Vempala,et al.  A simple polynomial-time rescaling algorithm for solving linear programs , 2008, Math. Program..

[36]  Jeff A. Bilmes,et al.  Active Learning as Non-Convex Optimization , 2009, AISTATS.

[37]  Sanjoy Dasgupta,et al.  Diameter-Based Active Learning , 2017, ICML.

[38]  John Langford,et al.  Agnostic Active Learning Without Constraints , 2010, NIPS.

[39]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[40]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[41]  Maria-Florina Balcan,et al.  Learning and 1-bit Compressed Sensing under Asymmetric Noise , 2016, COLT.

[42]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, FOCS.

[43]  Maria-Florina Balcan,et al.  S-Concave Distributions: Towards Broader Distributions for Noise-Tolerant and Sample-Efficient Learning Algorithms , 2017, ArXiv.

[44]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[45]  Kamalika Chaudhuri,et al.  Beyond Disagreement-Based Agnostic Active Learning , 2014, NIPS.

[46]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[47]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[48]  Nir Ailon,et al.  Active Learning Using Smooth Relative Regret Approximations with Applications , 2011, COLT.

[49]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[50]  Philip M. Long On the sample complexity of PAC learning half-spaces against the uniform distribution , 1995, IEEE Trans. Neural Networks.

[51]  Varun Kanade,et al.  Learning with a Drifting Target Concept , 2015, ALT.

[52]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[53]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[54]  Steve Hanneke,et al.  Theory of Disagreement-Based Active Learning , 2014, Found. Trends Mach. Learn..

[55]  John Langford,et al.  Efficient and Parsimonious Agnostic Active Learning , 2015, NIPS.

[56]  Daniel J. Hsu Algorithms for active learning , 2010 .

[57]  Alekh Agarwal,et al.  Selective sampling algorithms for cost-sensitive multiclass prediction , 2013, ICML.

[58]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[59]  Claire Monteleoni,et al.  Efficient Algorithms for General Active Learning , 2006, COLT.

[60]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[61]  Amit Daniely,et al.  Complexity theoretic limitations on learning halfspaces , 2015, STOC.

[62]  Claudio Gentile,et al.  Robust bounds for classification via selective sampling , 2009, ICML '09.