Competing with wild prediction rules

Abstract We consider the problem of on-line prediction competitive with a benchmark class of continuous but highly irregular prediction rules. It is known that if the benchmark class is a reproducing kernel Hilbert space, there exists a prediction algorithm whose average loss over the first N examples does not exceed the average loss of any prediction rule in the class plus a “regret term” of O(N−1/2). The elements of some natural benchmark classes, however, are so irregular that these classes are not Hilbert spaces. In this paper we develop Banach-space methods to construct a prediction algorithm with a regret term of O(N−1/p), where p∈[2,∞) and p−2 reflects the degree to which the benchmark class fails to be a Hilbert space. Only the square loss function is considered.

[1]  S. M. Nikol'skii,et al.  ON IMBEDDING, CONTINUATION AND APPROXIMATION THEOREMS FOR DIFFERENTIABLE FUNCTIONS OF SEVERAL VARIABLES , 1961 .

[2]  A. Kolmogoroff Grundbegriffe der Wahrscheinlichkeitsrechnung , 1933 .

[3]  H. Hanche-Olsen ON THE UNIFORM CONVEXITY OF L , 2005 .

[4]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[5]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[6]  H. Triebel Theory Of Function Spaces , 1983 .

[7]  Vladimir Vovk Competing with Wild Prediction Rules , 2006, COLT.

[8]  Vladimir Vovk,et al.  On-Line Regression Competitive with Reproducing Kernel Hilbert Spaces , 2005, TAMC.

[9]  Philip M. Long,et al.  WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[10]  H. Sorenson Least-squares estimation: from Gauss to Kalman , 1970, IEEE Spectrum.

[11]  William H. Press,et al.  Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .

[12]  Helly Grundbegriffe der Wahrscheinlichkeitsrechnung , 1936 .

[13]  David E. Edmunds,et al.  CIarkson's InequaIities, Besoy Spaces and Triebel–Sobolev Spaces , 1988 .

[14]  H. Vincent Poor,et al.  Linear estimation of self-similar processes via Lamperti's transformation , 2000 .

[15]  R. E. Kalman,et al.  New Results in Linear Filtering and Prediction Theory , 1961 .

[16]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[17]  G. Shafer,et al.  Probability and Finance: It's Only a Game! , 2001 .

[18]  Ioannis Karatzas,et al.  Brownian Motion and Stochastic Calculus , 1987 .

[19]  Jiří Reif,et al.  On moduli of convexity in Banach spaces , 2005 .

[20]  Diomedes Barcenas,et al.  On Moduli of Convexity in Banach Spaces , 2004 .

[21]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[22]  A. N. Kolmogorov,et al.  Foundations of the theory of probability , 1960 .

[23]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[24]  William H. Press,et al.  Numerical recipes in C , 2002 .

[25]  Joram Lindenstrauss On the modulus of smoothness and divergent series in Banach spaces. , 1963 .

[26]  H. Hanche-Olsen On the uniform convexity of L^p , 2005, math/0502021.

[27]  Dan Amir Moduli of Convexity and Smoothness , 1986 .

[28]  Vladimir Vovk Defensive Prediction with Expert Advice , 2005, ALT.

[29]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[30]  Tadashi Hiraoka On Uniformly Convex Spaces , 1953 .

[31]  J. T. Marti Evaluation of the Least Constant in Sobolev’s Inequality for $H^1 (0,s)$ , 1983 .

[32]  Vladimir Vovk,et al.  Metric entropy in competitive on-line prediction , 2006, ArXiv.

[33]  Joram Lindenstrauss Classical Banach Spaces II: Function Spaces , 1979 .

[34]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[35]  H. Triebel,et al.  Function Spaces, Entropy Numbers, Differential Operators: Function Spaces , 1996 .

[36]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[37]  Göte Nordlander The modulus of convexity in normed linear spaces , 1960 .

[38]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[39]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[40]  Vladimir Vovk Competitive on-line learning with a convex loss function , 2005, ArXiv.

[41]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[42]  Mikio Kato,et al.  Clarkson and Random Clarkson Inequalities for Lr(X) , 1997 .

[43]  H. Triebel Theory of Function Spaces III , 2008 .

[44]  Józef Banaś,et al.  Deformation of Banach spaces , 1993 .

[45]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .