Metric entropy in competitive on-line prediction

Competitive on-line prediction (also known as universal prediction of individual sequences) is a strand of learning theory avoiding making any stochastic assumptions about the way the observations are generated. The predictor's goal is to compete with a benchmark class of prediction rules, which is often a proper Banach function space. Metric entropy provides a unifying framework for competitive on-line prediction: the numerous known upper bounds on the metric entropy of various compact sets in function spaces readily imply bounds on the performance of on-line prediction strategies. This paper discusses strengths and limitations of the direct approach to competitive on-line prediction via metric entropy, including comparisons to other approaches.

[1]  S. Saitoh Integral Transforms, Reproducing Kernels and Their Applications , 1997 .

[2]  Vladimir Vovk,et al.  On-Line Regression Competitive with Reproducing Kernel Hilbert Spaces , 2005, TAMC.

[3]  Philip M. Long,et al.  WORST-CASE QUADRATIC LOSS BOUNDS FOR ON-LINE PREDICTION OF LINEAR FUNCTIONS BY GRADIENT DESCENT , 1993 .

[4]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[5]  W. Rudin Real and complex analysis , 1968 .

[6]  Steven A. Orszag,et al.  CBMS-NSF REGIONAL CONFERENCE SERIES IN APPLIED MATHEMATICS , 1978 .

[7]  A. Timan Theory of Approximation of Functions of a Real Variable , 1994 .

[8]  W. N. MUNDY,et al.  Treatise I , 2004, Avicenna, ›The Healing, Logic: Isagoge‹.

[9]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[10]  Andrew Donald Booth,et al.  Theory of the transmission and processing of information , 1961 .

[11]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[12]  O. Hanner On the uniform convexity ofLp andlp , 1956 .

[13]  A. Kolmogoroff,et al.  Zur Grossenordnung Des Restgliedes Fourierscher Reihen Differenzierbarer Funktionen , 1935 .

[14]  Alexander Gammerman,et al.  On-line Prediction with Kernels and the Complexity Approximation Principle , 2004, UAI.

[15]  Harold Widom,et al.  Rational approximation and n-dimensional diameter☆ , 1972 .

[16]  Joram Lindenstrauss Classical Banach Spaces II: Function Spaces , 1979 .

[17]  B. Carl,et al.  Entropy, Compactness and the Approximation of Operators , 1990 .

[18]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[19]  V. Bargmann On a Hilbert space of analytic functions and an associated integral transform part I , 1961 .

[20]  L. Pontrjagin,et al.  Sur Une Propriete Metrique de la Dimension , 1932 .

[21]  Yuri Kalnishkan,et al.  The weak aggregating algorithm and weak mixability , 2005, J. Comput. Syst. Sci..

[22]  Dean Phillips Foster Prediction in the Worst Case , 1991 .

[23]  David E. Edmunds,et al.  CIarkson's InequaIities, Besoy Spaces and Triebel–Sobolev Spaces , 1988 .

[24]  Don R. Hush,et al.  An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels , 2006, IEEE Transactions on Information Theory.

[25]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[26]  H. Hanche-Olsen ON THE UNIFORM CONVEXITY OF L , 2005 .

[27]  L. Rubel,et al.  Constructive Function Theory , 1984 .

[28]  H. Triebel,et al.  Function Spaces, Entropy Numbers, Differential Operators: Function Spaces , 1996 .

[29]  Alexander J. Smola,et al.  Support Vector Machine Reference Manual , 1998 .

[30]  Vladimir Vovk,et al.  Leading strategies in competitive on-line prediction , 2006, Theor. Comput. Sci..

[31]  J. Cooper,et al.  Theory of Approximation , 1960, Mathematical Gazette.

[32]  Vladimir Vovk,et al.  Competing with wild prediction rules , 2005, Machine Learning.

[33]  H. Hanche-Olsen On the uniform convexity of L^p , 2005, math/0502021.

[34]  J. A. Clarkson Uniformly convex spaces , 1936 .

[35]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[36]  Vladimir Vovk,et al.  Competing with Markov prediction strategies , 2006, ArXiv.

[37]  Radakovič The theory of approximation , 1932 .

[38]  Vladimir Vovk Non-asymptotic calibration and resolution , 2007, Theor. Comput. Sci..

[39]  齋藤 三郎 Integral transforms, reproducing kernels and their applications , 1997 .

[40]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[41]  Vladimir Vovk,et al.  Competing with Stationary Prediction Strategies , 2006, COLT.

[42]  L. Ahlfors Complex Analysis , 1979 .

[43]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[44]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[45]  Vladimir Vovk,et al.  Predictions as Statements and Decisions , 2006, COLT.

[46]  G. F. Clements Entropies of sets of functions of bounded variation , 1963 .

[47]  V. Vovk Competitive On‐line Statistics , 2001 .

[48]  N. Bary,et al.  Treatise of Trigonometric Series , 1966 .

[49]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..