Competing with Wild Prediction Rules

We consider the problem of on-line prediction competitive with a benchmark class of continuous but highly irregular prediction rules. It is known that if the benchmark class is a reproducing kernel Hilbert space, there exists a prediction algorithm whose average loss over the first N examples does not exceed the average loss of any prediction rule in the class plus a “regret term” of O(N−1/2). The elements of some natural benchmark classes, however, are so irregular that these classes are not Hilbert spaces. In this paper we develop Banach-space methods to construct a prediction algorithm with a regret term of O(N$^{\rm -1/{\it p}}$), where p∈(2,∞) and p–2 reflects the degree to which the benchmark class fails to be a Hilbert space.

[1]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[2]  Mikio Kato,et al.  Clarkson and Random Clarkson Inequalities for Lr(X) , 1997 .

[4]  Vladimir Vovk Defensive Prediction with Expert Advice , 2005, ALT.

[5]  Vladimir Vovk,et al.  Competing with wild prediction rules , 2005, Machine Learning.

[6]  H. Sorenson Least-squares estimation: from Gauss to Kalman , 1970, IEEE Spectrum.

[7]  J. A. Clarkson Uniformly convex spaces , 1936 .

[8]  Göte Nordlander The modulus of convexity in normed linear spaces , 1960 .

[9]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[10]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[11]  Józef Banaś,et al.  Deformation of Banach spaces , 1993 .

[12]  T. Figiel On the moduli of convexity and smoothness , 1976 .

[13]  S. M. Nikol'skii,et al.  ON IMBEDDING, CONTINUATION AND APPROXIMATION THEOREMS FOR DIFFERENTIABLE FUNCTIONS OF SEVERAL VARIABLES , 1961 .

[14]  A. Kolmogoroff Grundbegriffe der Wahrscheinlichkeitsrechnung , 1933 .

[15]  J. T. Marti Evaluation of the Least Constant in Sobolev’s Inequality for $H^1 (0,s)$ , 1983 .

[16]  Vladimir Vovk,et al.  Metric entropy in competitive on-line prediction , 2006, ArXiv.

[17]  O. Hanner On the uniform convexity ofLp andlp , 1956 .

[18]  William H. Press,et al.  Numerical recipes in C , 2002 .

[19]  Vladimir Vovk,et al.  On-Line Regression Competitive with Reproducing Kernel Hilbert Spaces , 2005, TAMC.

[20]  Philip M. Long,et al.  WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[21]  Diomedes Barcenas,et al.  On Moduli of Convexity in Banach Spaces , 2004 .

[22]  G. Shafer,et al.  Probability and Finance: It's Only a Game! , 2001 .

[23]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[24]  Ioannis Karatzas,et al.  Brownian Motion and Stochastic Calculus , 1987 .

[25]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[26]  H. Vincent Poor,et al.  Linear estimation of self-similar processes via Lamperti's transformation , 2000 .

[27]  R. E. Kalman,et al.  New Results in Linear Filtering and Prediction Theory , 1961 .

[28]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[29]  Joram Lindenstrauss On the modulus of smoothness and divergent series in Banach spaces. , 1963 .

[30]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[31]  Vladimir Vovk Competitive on-line learning with a convex loss function , 2005, ArXiv.

[32]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[33]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .