Geometry and learning curves of kernel methods with polynomial kernels

The properties of learning machines with polynomial kernel classifiers, such as support vector machines or kernel perceptrons, are examined. We first derive the number of effective examples which are related to generalization error. Next, we analyze the average prediction errors of several algorithms and show these errors do not depend on the apparent dimension of the feature space. This means that what is called the overfitting phenomena do not appear in kernel methods with polynomial kernels. © 2004 Wiley Periodicals, Inc. Syst Comp Jpn, 35(7): 41–48, 2004; Published online in Wiley InterScience (). DOI 10.1002sscj.10629

[1]  M. Opper,et al.  Statistical mechanics of Support Vector networks. , 1998, cond-mat/9811421.

[2]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[3]  Shun-ichi Amari,et al.  Prediction error and consistent parameter area in neural learning , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[4]  T. Hanselmann,et al.  Comparison between support vector algorithm and algebraic perceptron , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[5]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[6]  Shun-ichi Amari,et al.  Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Kazushi Ikeda Convergence theorem for kernel perceptron , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[9]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[10]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[11]  Sumio Watanabe Algebraic Analysis for Singular Statistical Estimation , 1999, ALT.

[12]  B. Efron The convex hull of a random set of points , 1965 .

[13]  Shun-ichi Amari,et al.  A universal theorem on learning curves , 1993, Neural Networks.

[14]  Sumio Watanabe,et al.  Algebraic Analysis for Nonidentifiable Learning Machines , 2001, Neural Computation.

[15]  Shun-ichi Amari,et al.  Four Types of Learning Curves , 1992, Neural Computation.