Computing the Bayes Kernel Classifler

We present below a simple ray-tracing algorithm for estimating the Bayes classifier for a given class of parameterized kernels. Support Vector Machines try to achieve good generalization by computing the maximum margin separating hyperplane in a high-dimensional feature space. This approach effectively combines two very good ideas. The first idea is to map the space of input vectors into a very high-dimensional feature space in such a way that nonlinear decisions functions on the input space can be constructed by using only separating hyperplanes on the feature space. By making use of kernels, we can implicitly perform such mappings without explicitly using high-dimensional separating vectors(Boser et al., 1992). Since it is very likely that the training examples will be linearly separable in the high-dimensional feature space, this method offers an elegant alternative to network growth algorithms as in(Ruján and Marchand, 1989; Marchand et al., 1990) which try to construct non-linear decision surfaces by combining perceptrons. The second idea is to construct the separating hyperplane on the feature space which has the largest possible margin. Indeed, it was shown by Vapnik and others that this may give good generalization even if the dimension of the feature space

[1]  Alkemade Pp,et al.  Playing Billiard in Version Space , 1997 .

[2]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[3]  L. Bunimovich On the ergodic properties of nowhere dispersing billiards , 1979 .

[4]  Terrence J. Sejnowski,et al.  Analysis of hidden units in a layered network trained to classify sonar targets , 1988, Neural Networks.

[5]  Opper,et al.  Generalization performance of Bayes optimal classification algorithm for learning a perceptron. , 1991, Physical review letters.

[6]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[7]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[8]  Mario Marchand,et al.  Learning by Minimizing Resources in Neural Networks , 1989, Complex Syst..

[9]  M. Golea,et al.  A Convergence Theorem for Sequential Learning in Two-Layer Perceptrons , 1990 .

[10]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[11]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[12]  Leonid Khachiyan,et al.  On the complexity of approximating the maximal inscribed ellipsoid for a polytope , 1993, Math. Program..

[13]  Peter F. Lambert DESIGNING PATTERN CATEGORIZERS WITH EXTREMAL PARADIGM INFORMATION , 1969 .

[14]  M. Berry Quantizing a classically ergodic system: Sinai's billiard and the KKR method , 1981 .

[15]  Ralf Herbrich,et al.  Bayes Point Machines: Estimating the Bayes Point in Kernel Space , 1999 .

[16]  Pal Rujan,et al.  Playing Billiards in Version Space , 1997, Neural Computation.

[17]  A. N. Zemlyakov,et al.  Topological transitivity of billiards in polygons , 1975 .

[18]  T. Watkin Optimal Learning with a Neural Network , 1993 .

[19]  L. Bunimovich,et al.  Markov Partitions for dispersed billiards , 1980 .