Exact Convex Confidence-Weighted Learning

Confidence-weighted (CW) learning [6], an online learning method for linear classifiers, maintains a Gaussian distributions over weight vectors, with a covariance matrix that represents uncertainty about weights and correlations. Confidence constraints ensure that a weight vector drawn from the hypothesis distribution correctly classifies examples with a specified probability. Within this framework, we derive a new convex form of the constraint and analyze it in the mistake bound model. Empirical evaluation with both synthetic and text data shows our version of CW learning achieves lower cumulative and out-of-sample errors than commonly used first-order and second-order online methods.

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[3]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[4]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[5]  Richard S. Sutton,et al.  Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta , 1992, AAAI.

[6]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[7]  Michael I. Jordan,et al.  A Variational Approach to Bayesian Logistic Regression Models and their Extensions , 1997, AISTATS.

[8]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[9]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[10]  Y. Censor,et al.  Parallel Optimization:theory , 1997 .

[11]  Colin Campbell,et al.  Bayes Point Machines , 2001, J. Mach. Learn. Res..

[12]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[13]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[14]  John C. Platt,et al.  Online Bayes Point Machines , 2003, PAKDD.

[15]  Claudio Gentile,et al.  A Second-Order Perceptron Algorithm , 2002, SIAM J. Comput..

[16]  Koby Crammer,et al.  Robust Support Vector Machine Training via Convex Outlier Ablation , 2006, AAAI.

[17]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[18]  Pannagadatta K. Shivaswamy Ellipsoidal Kernel Machines , 2007 .

[19]  Koby Crammer,et al.  Confidence-weighted linear classification , 2008, ICML '08.

[20]  Koby Crammer,et al.  Active Learning with Confidence , 2008, ACL.

[21]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.