论文信息 - Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators

Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators

We derive new bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks by obtaining new bounds on their covering numbers. The proofs make use of a viewpoint that is apparently novel in the field of statistical learning theory. The hypothesis class is described in terms of a linear operator mapping from a possibly infinite-dimensional unit ball in feature space into a finite-dimensional space. The covering numbers of the class are then determined via the entropy numbers of the operator. These numbers, which characterize the degree of compactness of the operator can be bounded in terms of the eigenvalues of an integral operator induced by the kernel function used by the machine. As a consequence, we are able to theoretically explain the effect of the choice of kernel function on the generalization performance of support vector machines.

[1] G. Watson. Bessel Functions. (Scientific Books: A Treatise on the Theory of Bessel Functions) , 1923 .

[2] L. Pontrjagin,et al. Sur Une Propriete Metrique de la Dimension , 1932 .

[3] L. Milne‐Thomson. A Treatise on the Theory of Bessel Functions , 1945, Nature.

[4] A. Kolmogorov,et al. Entropy and "-capacity of sets in func-tional spaces , 1961 .

[5] H. Widom. Asymptotic behavior of the eigenvalues of certain integral equations , 1963 .

[6] M. Aizerman,et al. Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[7] Nils J. Nilsson,et al. Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[8] R. Prosser. The ϵ-entropy and ϵ-capacity of certain time-varying channels , 1966 .

[9] Robert B. Ash,et al. Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[10] R. Prosser,et al. The ε-entropy and ε-capacity of certain time-invariant channels , 1968 .

[11] Shun-ichi Amari,et al. A Theory of Pattern Recognition , 1968 .

[12] D. Jagerman. $\varepsilon $-Entropy and Approximation of Bandlimited Functions , 1969 .

[13] H. Triebel. Interpolationseigenschaften von Entropie und Durchmesseridealen kompakter Operatoren , 1970 .

[14] I. N. Sneddon. The use of integral transforms , 1972 .

[15] Miss A.O. Penney. (b) , 1974, The New Yale Book of Quotations.

[16] B. Carl. Entropy numbers of diagonal operators with an application to eigenvalue problems , 1981 .

[17] V. Vapnik,et al. Necessary and Sufficient Conditions for the Uniform Convergence of Means to their Expectations , 1982 .

[18] Vladimir Vapnik,et al. Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[19] B. Carl. Inequalities of Bernstein-Jackson-type and the degree of compactness of operators in Banach spaces , 1985 .

[20] A. Pietsch. Eigenvalue distribution of compact operators , 1986 .

[21] Shigeo Akashi. An operator theoretical characterization of ε-entropy in Gaussian processes , 1986 .

[22] C. Micchelli. Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[23] M. Talagrand. The Glivenko-Cantelli Problem , 1987 .

[24] C. Schütt,et al. Geometric and probabilistic estimates for entropy and approximation numbers of operators , 1987 .

[25] A. Papoulis. On Entropy Rate , 1987 .

[26] Saburou Saitoh,et al. Theory of Reproducing Kernels and Its Applications , 1988 .

[27] Shigeo Akashi. The asymptotic behavior of ε-entropy of a compact positive operator , 1990 .

[28] B. Carl,et al. Entropy, Compactness and the Approximation of Operators , 1990 .

[29] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[30] V. I. Kolchinskii. Entropic order of operators in Banach spaces and the central limit theorem , 1992 .

[31] R. Schapire. Toward Eecient Agnostic Learning , 1992 .

[32] F. Girosi,et al. From regularization to radial, tensor and additive splines , 1993, Neural Networks for Signal Processing III - Proceedings of the 1993 IEEE-SP Workshop.

[33] M. Junge,et al. Some estimates on entropy numbers , 1993 .

[34] M. Junge,et al. Characterization of weak type by the entropy distribution of r-nuclear operators , 1993 .

[35] J. Peetre,et al. ϵ-Entropy, ϵ-Rate, and Interpolation Spaces Revisited with an Application to Linear Communication Channels , 1994 .

[36] Martin Anthony,et al. Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants , 1994 .

[37] Philip M. Long,et al. Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[38] Bernhard Schölkopf,et al. Extracting Support Data for a Given Task , 1995, KDD.

[39] Peter L. Bartlett,et al. The importance of convexity in learning with squared loss , 1998, COLT '96.

[40] John Shawe-Taylor,et al. A framework for structural risk minimisation , 1996, COLT '96.

[41] G. Lorentz,et al. Constructive approximation : advanced problems , 1996 .

[42] M. Talagrand. The Glivenko-Cantelli problem, ten years later , 1996 .

[43] Claus Müller. Analysis of Spherical Symmetries in Euclidean Spaces , 1997 .

[44] Noga Alon,et al. Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.