PAC Analogues of Perceptron and Winnow Via Boosting the Margin

We describe a novel family of PAC model algorithms for learning linear threshold functions. The new algorithms work by boosting a simple weak learner and exhibit sample complexity bounds remarkably similar to those of known online algorithms such as Perceptron and Winnow, thus suggesting that these well-studied online algorithms in some sense correspond to instances of boosting. We show that the new algorithms can be viewed as natural PAC analogues of the online p-norm algorithms which have recently been studied by Grove, Littlestone, and Schuurmans (1997, Proceedings of the Tenth Annual Conference on Computational Learning Theory (pp. 171–183) and Gentile and Littlestone (1999, Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 1–11). As special cases of the algorithm, by taking p = 2 and p = ∞ we obtain natural boosting-based PAC analogues of Perceptron and Winnow respectively. The p = ∞ case of our algorithm can also be viewed as a generalization (with an improved sample complexity bound) of Jackson and Craven's PAC-model boosting-based algorithm for learning “sparse perceptrons” (Jackson & Craven, 1996, Advances in neural information processing systems 8, MIT Press). The analysis of the generalization error of the new algorithms relies on techniques from the theory of large margin classification.

[1]  Manfred K. Warmuth,et al.  The Perceptron Algorithm Versus Winnow: Linear Versus Logarithmic Mistake Bounds when Few Input Variables are Relevant (Technical Note) , 1997, Artif. Intell..

[2]  Manfred K. Warmuth,et al.  The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant , 1995, COLT '95.

[3]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[4]  Yoav Freund,et al.  An improved boosting algorithm and its implications on learning complexity , 1992, COLT '92.

[5]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[6]  Tom Bylander Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms , 1998, Artif. Intell..

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  N. Fisher,et al.  Probability Inequalities for Sums of Bounded Random Variables , 1994 .

[9]  Alexander A. Razborov,et al.  Majority gates vs. general weighted threshold gates , 1992, [1992] Proceedings of the Seventh Annual Structure in Complexity Theory Conference.

[10]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[11]  Dale Schuurmans,et al.  General Convergence Results for Linear Discriminant Updates , 1997, COLT '97.

[12]  Rocco A. Servedio,et al.  On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm , 1999, COLT '99.

[13]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[17]  Philip M. Long Halfspace Learning, Linear Programming, and Nonmalicious Distributions , 1994, Inf. Process. Lett..

[18]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[19]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[20]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[21]  M. Kearns,et al.  Recent Results on Boolean Concept Learning , 1987 .

[22]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[23]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[24]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[25]  Robert E. Schapire,et al.  Drifting Games , 1999, COLT '99.

[26]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[27]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[28]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[29]  Michael Schmitt,et al.  Identification Criteria and Lower Bounds for Perceptron-Like Learning Rules , 1998, Neural Computation.

[30]  Peter Auer,et al.  Tracking the Best Disjunction , 1998, Machine Learning.

[31]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[32]  Tunc Geveci,et al.  Advanced Calculus , 2014, Nature.

[33]  Wolfgang Maass,et al.  How fast can a threshold gate learn , 1994, COLT 1994.

[34]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[35]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[36]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[37]  Chuanyi Ji,et al.  Combinations of Weak Classifiers , 1996, NIPS.

[38]  Yishay Mansour,et al.  On the boosting ability of top-down decision tree learning algorithms , 1996, STOC '96.

[39]  Mark Craven,et al.  Learning Sparse Perceptrons , 1995, NIPS.

[40]  Edith Cohen,et al.  Learning noisy perceptrons by a perceptron in polynomial time , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[41]  Nick Littlestone,et al.  Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[42]  Eric B. Baum,et al.  The Perceptron Algorithm is Fast for Nonmalicious Distributions , 1990, Neural Computation.

[43]  Alexander A. Razborov,et al.  Majority gates vs. general weighted threshold gates , 2005, computational complexity.

[44]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .