Bayes Point Machines: Estimating the Bayes Point in Kernel Space

From a Bayesian perspective Support Vector Machines choose the hypothesis corresponding to the largest possible hypersphere that can be inscribed in version space, i.e. in the space of all consistent hypotheses given a training set. Those boundaries of version space which are tangent to the hypersphere define the support vectors. An alternative and potentially better approach is to construct the hypothesis using the whole of version space. This is achieved by using a Bayes Point Machine which finds the midpoint of the region of intersection of all hyperplanes bisecting version space into two halves of equal volume (the Bayes point). It is known that the center of mass of version space approximates the Bayes point [Watkin, 1993]. We suggest estimating the center of mass by averaging over the trajectory of a billiard ball bouncing in version space. Experimental results are presented indicating that Bayes Point Machines consistently outperform Support Vector Machines.

[1]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[2]  G. Wahba Spline models for observational data , 1990 .

[3]  Opper,et al.  Generalization performance of Bayes optimal classification algorithm for learning a perceptron. , 1991, Physical review letters.

[4]  T. Watkin Optimal Learning with a Neural Network , 1993 .

[5]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[6]  Pal Rujan,et al.  Playing Billiards in Version Space , 1997, Neural Computation.

[7]  Radford M. Neal Markov Chain Monte Carlo Methods Based on `Slicing' the Density Function , 1997 .

[8]  John Shawe-Taylor,et al.  A PAC analysis of a Bayesian estimator , 1997, COLT '97.

[9]  Alkemade Pp,et al.  Playing Billiard in Version Space , 1997 .

[10]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  N. Cristianini,et al.  Robust Bounds on Generalization from the Margin Distribution , 1998 .

[13]  Nello Cristianini,et al.  Bayesian Classifiers Are Large Margin Hyperplanes in a Hilbert Space , 1998, ICML.

[14]  Klaus Obermayer,et al.  Regression Models for Ordinal Data: A Machine Learning Approach , 1999 .

[15]  Ralf Herbrich,et al.  Bayesian Learning in Reproducing Kernel Hilbert Spaces: The Usefulness of the Bayes Point , 1999 .

[16]  October I Physical Review Letters , 2022 .