The Robustness of the p-Norm Algorithms

We consider two on-line learning frameworks: binary classification through linear threshold functions and linear regression. We study a family of on-line algorithms, called p-norm algorithms, introduced by Grove, Littlestone and Schuurmans in the context of deterministic binary classification. We show how to adapt these algorithms for use in the regression setting, and prove worst-case bounds on the square loss, using a technique from Kivinen and Warmuth. As pointed out by Grove, et al., these algorithms can be made to approach a version of the classification algorithm Winnow as p goes to infinity; similarly they can be made to approach the corresponding regression algorithm EG in the limit. Winnow and EG are notable for having loss bounds that grow only logarithmically in the dimension of the instance space. Here we describe another way to use the p-norm algorithms to achieve this logarithmic behavior. With the way to use them that we propose, it is less critical than with Winnow and EG to retune the parameters of the algorithm as the learning task changes. Since the correct setting of the parameters depends on characteristics of the learning task that are not typically known a priori by the learner, this gives the p-norm algorithms a desireable robustness. Our elaborations yield various new loss bounds in these on-line settings. Some of these bounds improve or generalize known results. Others are incomparable.

[1]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[2]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[3]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[4]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[5]  Y. Censor,et al.  An iterative row-action method for interval convex programming , 1981 .

[6]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[7]  D. Angluin Queries and Concept Learning , 1988 .

[8]  Nick Littlestone,et al.  From on-line to batch learning , 1989, COLT '89.

[9]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[10]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[11]  Nick Littlestone,et al.  Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[12]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[13]  Philip M. Long,et al.  WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[14]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[15]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[16]  Manfred K. Warmuth,et al.  On Weak Learning , 1995, J. Comput. Syst. Sci..

[17]  Robert E. Schapire,et al.  Predicting Nearly as Well as the Best Pruning of a Decision Tree , 1995, COLT.

[18]  Chris Mesterharm,et al.  An Apobayesian Relative of Winnow , 1996, NIPS.

[19]  Manfred K. Warmuth,et al.  How to use expert advice , 1997, JACM.

[20]  Vladimir Vovk,et al.  Competitive On-line Linear Regression , 1997, NIPS.

[21]  Dale Schuurmans,et al.  General Convergence Results for Linear Discriminant Updates , 1997, COLT '97.

[22]  Tom Bylander,et al.  The binary exponentiated gradient algorithm for learning linear functions , 1997, COLT '97.

[23]  Kenji Yamanishi,et al.  A Decision-Theoretic Extension of Stochastic Complexity and Its Applications to Learning , 1998, IEEE Trans. Inf. Theory.

[24]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[25]  N. Cesa-Bianchi,et al.  On Bayes Methods for On-Line Boolean Prediction , 1998, Annual Conference Computational Learning Theory.

[26]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[27]  Mark Herbster,et al.  Tracking the best regressor , 1998, COLT' 98.

[28]  Geoffrey J. Gordon Regret bounds for prediction problems , 1999, COLT '99.

[29]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[30]  Manfred K. Warmuth,et al.  Relative Expected Instantaneous Loss Bounds , 2000, J. Comput. Syst. Sci..