论文信息 - Exponentiated Gradient Versus Gradient Descent for Linear Predictors

Exponentiated Gradient Versus Gradient Descent for Linear Predictors

We consider two algorithm for on-line prediction based on a linear model. The algorithms are the well-known Gradient Descent (GD) algorithm and a new algorithm, which we call EG(+/-). They both maintain a weight vector using simple updates. For the GD algorithm, the update is based on subtracting the gradient of the squared error made on a prediction. The EG(+/-) algorithm uses the components of the gradient in the exponents of factors that are used in updating the weight vector multiplicatively. We present worst-case loss bounds for EG(+/-) and compare them to previously known bounds for the GD algorithm. The bounds suggest that the losses of the algorithms are in general incomparable, but EG(+/-) has a much smaller loss if only a few components of the input are relevant for the predictions. We have performed experiments, which show that our worst-case upper bounds are quite tight already on simple artificial data.

Manfred K. Warmuth | Jyrki Kivinen | Jyrki Kivinen

[1] H. Johnson,et al. A comparison of 'traditional' and multimedia information systems development practices , 2003, Inf. Softw. Technol..

[2] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3] S. Thomas Alexander,et al. Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[4] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[5] Geoffrey E. Hinton. Learning distributed representations of concepts. , 1989 .

[6] Nick Littlestone,et al. From on-line to batch learning , 1989, COLT '89.

[7] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.

[8] N. Littlestone. Mistake bounds and logarithmic linear-threshold learning algorithms , 1990 .

[9] Guy Jumarie,et al. Relative Information — What For? , 1990 .

[10] Nick Littlestone,et al. Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[11] Philip M. Long,et al. On-line learning of linear functions , 1991, STOC '91.