Linear Probability Forecasting

In this paper we consider two online multi-class classification problems: classification with linear models and with kernelized models. The predictions can be thought of as probability distributions. The quality of predictions is measured by the Brier loss function. We suggest two computationally efficient algorithms to work with these problems, the second algorithm is derived by considering a new class of linear prediction models. We prove theoretical guarantees on the cumulative losses of the algorithms. We kernelize one of the algorithms and prove theoretical guarantees on the loss of the kernelized version. We perform experiments and compare our algorithms with logistic regression.

[1]  David Haussler,et al.  Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[2]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[3]  Manfred K. Warmuth,et al.  Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[4]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[5]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[6]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[7]  A. Fiacco A Finite Algorithm for Finding the Projection of a Point onto the Canonical Simplex of R " , 2009 .

[8]  Alexander Gammerman,et al.  On-line Prediction with Kernels and the Complexity Approximation Principle , 2004, UAI.

[9]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[10]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[11]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[12]  Vladimir Vovk,et al.  Prediction with expert advice for the Brier game , 2007, ICML '08.

[13]  V. Vovk Competitive On‐line Statistics , 2001 .

[14]  C. Michelot A finite algorithm for finding the projection of a point onto the canonical simplex of ∝n , 1986 .

[15]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[17]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.