General Convergence Results for Linear Discriminant Updates

The problem of learning linear-discriminant concepts can be solved by various mistake-driven update procedures, including the Winnow family of algorithms and the well-known Perceptron algorithm. In this paper we define the general class of “quasi-additive” algorithms, which includes Perceptron and Winnow as special cases. We give a single proof of convergence that covers a broad subset of algorithms in this class, including both Perceptron and Winnow, but also many new algorithms. Our proof hinges on analyzing a generic measure of progress construction that gives insight as to when and how such algorithms converge.Our measure of progress construction also permits us to obtain good mistake bounds for individual algorithms. We apply our unified analysis to new algorithms as well as existing algorithms. When applied to known algorithms, our method “automatically” produces close variants of existing proofs (recovering similar bounds)—thus showing that, in a certain sense, these seemingly diverse results are fundamentally isomorphic. However, we also demonstrate that the unifying principles are more broadly applicable, and analyze a new class of algorithms that smoothly interpolate between the additive-update behavior of Perceptron and the multiplicative-update behavior of Winnow.

[1]  TWO-WEEK Loan COpy,et al.  University of California , 1886, The American journal of dental science.

[2]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[3]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[4]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[5]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[7]  R. Ellis,et al.  Entropy, large deviations, and statistical mechanics , 1985 .

[8]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[9]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[10]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[11]  Nick Littlestone,et al.  Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[12]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[13]  Lenny Pitt,et al.  Proceedings of the sixth annual conference on Computational learning theory , 1993, COLT 1993.

[14]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[15]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Based Algorithms: Results on a Calendar Scheduling Domain , 1995, ICML.

[16]  Nick Littlestone,et al.  Comparing Several Linear-threshold Learning Algorithms on Tasks Involving Superfluous Attributes , 1995, ICML.

[17]  Manfred K. Warmuth,et al.  Worst-case Loss Bounds for Single Neurons , 1995, NIPS.

[18]  Manfred K. Warmuth,et al.  The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant , 1995, COLT '95.

[19]  Chris Mesterharm,et al.  An Apobayesian Relative of Winnow , 1996, NIPS.

[20]  Manfred K. Warmuth,et al.  How to use expert advice , 1997, JACM.

[21]  Dale Schuurmans,et al.  General Convergence Results for Linear Discriminant Updates , 1997, COLT.

[22]  Manfred K. Warmuth,et al.  The Perceptron Algorithm Versus Winnow: Linear Versus Logarithmic Mistake Bounds when Few Input Variables are Relevant (Technical Note) , 1997, Artif. Intell..

[23]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[24]  Ido Dagan,et al.  Mistake-Driven Learning in Text Categorization , 1997, EMNLP.

[25]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[26]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[27]  N. Cesa-Bianchi,et al.  On Bayes Methods for On-Line Boolean Prediction , 1998, Annual Conference Computational Learning Theory.

[28]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[29]  Leslie G. Valiant,et al.  Relational Learning for NLP using Linear Threshold Elements , 1999, IJCAI.

[30]  Manfred K. Warmuth,et al.  Relative loss bounds for single neurons , 1999, IEEE Trans. Neural Networks.