Tracking the best hyperplane with a simple budget Perceptron

Shifting bounds for on-line classification algorithms ensure good performance on any sequence of examples that is well predicted by a sequence of changing classifiers. When proving shifting bounds for kernel-based classifiers, one also faces the problem of storing a number of support vectors that can grow unboundedly, unless an eviction policy is used to keep this number under control. In this paper, we show that shifting and on-line learning on a budget can be combined surprisingly well. First, we introduce and analyze a shifting Perceptron algorithm achieving the best known shifting bounds while using an unlimited budget. Second, we show that by applying to the Perceptron algorithm the simplest possible eviction policy, which discards a random support vector each time a new one comes in, we achieve a shifting bound close to the one we obtained with no budget restrictions. More importantly, we show that our randomized algorithm strikes the optimal trade-off $$U = \Theta(\sqrt{B})$$ between budget B and norm U of the largest classifier in the comparison sequence. Experiments are presented comparing several linear-threshold algorithms on chronologically-ordered textual datasets. These experiments support our theoretical findings in that they show to what extent randomized budget algorithms are more robust than deterministic ones when learning shifting target data streams.

[1]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[2]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine-mediated learning.

[3]  Claudio Gentile,et al.  Learning Probabilistic Linear-Threshold Classifiers via Selective Sampling , 2003, COLT.

[4]  Jason Weston,et al.  Online (and Offline) on an Even Tighter Budget , 2005, AISTATS.

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[7]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[8]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[9]  N. Littlestone Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[10]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[11]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[12]  H. D. Block The perceptron: a model for brain functioning. I , 1962 .

[13]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[14]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[15]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[16]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[17]  Dale Schuurmans,et al.  General Convergence Results for Linear Discriminant Updates , 1997, COLT '97.

[18]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 2003, Machine Learning.

[19]  Yi Li,et al.  The Relaxed Online Maximum Margin Algorithm , 1999, Machine Learning.

[20]  Claudio Gentile,et al.  A New Approximate Maximal Margin Classification Algorithm , 2002, J. Mach. Learn. Res..

[21]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[22]  Claudio Gentile,et al.  Linear Hinge Loss and Average Margin , 1998, NIPS.

[23]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[24]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Fixed Budget , 2005, NIPS.

[25]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.