论文信息 - A Second-Order Perceptron Algorithm

A Second-Order Perceptron Algorithm

Kernel-based linear-threshold algorithms, such as support vector machines and Perceptron-like algorithms, are among the best available techniques for solving pattern classification problems. In this paper, we describe an extension of the classical Perceptron algorithm, called second-order Perceptron, and analyze its performance within the mistake bound model of on-line learning. The bound achieved by our algorithm depends on the sensitivity to second-order data information and is the best known mistake bound for (efficient) kernel-based linear-threshold classifiers to date. This mistake bound, which strictly generalizes the well-known Perceptron bound, is expressed in terms of the eigenvalues of the empirical data correlation matrix and depends on a parameter controlling the sensitivity of the algorithm to the distribution of these eigenvalues. Since the optimal setting of this parameter is not known a priori, we also analyze two variants of the second-order Perceptron algorithm: one that adaptively sets the value of the parameter in terms of the number of mistakes made so far, and one that is parameterless, based on pseudoinverses.

[1] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2] H. D. Block. The perceptron: a model for brain functioning. I , 1962 .

[3] Albert B Novikoff,et al. ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[4] M. Aizerman,et al. Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[5] M. Marcus,et al. Introduction to linear algebra , 1965 .

[6] David G. Stork,et al. Pattern Classification , 1973 .

[7] Adi Ben-Israel,et al. Generalized inverses: theory and applications , 1974 .

[8] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[9] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[10] Dana Angluin,et al. Queries and concept learning , 1988, Machine Learning.

[11] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[12] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.

[13] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .

[14] Nick Littlestone,et al. Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[15] David Haussler,et al. How to use expert advice , 1993, STOC.

[16] Mark Herbster,et al. Tracking the Best Expert , 1995, Machine-mediated learning.

[17] Tracking the best disjunction , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[18] Manfred K. Warmuth,et al. The perceptron algorithm vs. Winnow: linear vs. logarithmic mistake bounds when few input variables are relevant , 1995, COLT '95.

[19] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[20] Manfred K. Warmuth,et al. The Perceptron Algorithm Versus Winnow: Linear Versus Logarithmic Mistake Bounds when Few Input Variables are Relevant (Technical Note) , 1997, Artif. Intell..

[21] Dale Schuurmans,et al. General Convergence Results for Linear Discriminant Updates , 1997, COLT '97.

[22] Claudio Gentile,et al. Linear Hinge Loss and Average Margin , 1998, NIPS.

[23] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[24] Allan Borodin,et al. Online computation and competitive analysis , 1998 .

[25] Alexander Gammerman,et al. Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[26] Alexander J. Smola,et al. Learning with kernels , 1998 .

[27] Claudio Gentile,et al. The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[28] Nello Cristianini,et al. An introduction to Support Vector Machines , 2000 .

[29] Peter Auer,et al. Using upper confidence bounds for online learning , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[30] Manfred K. Warmuth,et al. Relative Expected Instantaneous Loss Bounds , 2000, J. Comput. Syst. Sci..

[31] A. E. Hoerl,et al. Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[32] David G. Stork,et al. Pattern classification, 2nd Edition , 2000 .

[33] Claudio Gentile,et al. A New Approximate Maximal Margin Classification Algorithm , 2002, J. Mach. Learn. Res..

[34] V. Vovk. Competitive On‐line Statistics , 2001 .

[35] Mark Herbster,et al. Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[36] Claudio Gentile,et al. Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[37] Johan A. K. Suykens,et al. Least Squares Support Vector Machines , 2002 .

[38] T. Poggio,et al. Chapter 7 Regularized Least-Squares Classification , 2003 .

[39] Manfred K. Warmuth,et al. Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[40] Yi Li,et al. The Relaxed Online Maximum Margin Algorithm , 1999, Machine Learning.

[41] Manfred K. Warmuth,et al. Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[42] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[43] Jing Peng,et al. SVM vs regularized least squares classification , 2004, ICPR 2004.

[44] Manfred K. Warmuth,et al. Relative Loss Bounds for Temporal-Difference Learning , 2000, Machine Learning.

[45] Nicolò Cesa-Bianchi,et al. On-line Prediction and Conversion Strategies , 1994, Machine Learning.