论文信息 - Bayesian Learning in Reproducing Kernel Hilbert Spaces

Bayesian Learning in Reproducing Kernel Hilbert Spaces

Support Vector Machines nd the hypothesis that corresponds to the centre of the largest hypersphere that can be placed inside version space, i.e. the space of all consistent hypotheses given a training set. The boundaries of version space touched by this hypersphere de ne the support vectors. An even more promising approach is to construct the hypothesis using the whole of version space. This is achieved by the Bayes point: the midpoint of the region of intersection of all hyperplanes bisecting version space into two volumes of equal magnitude. It is known that the centre of mass of version space approximates the Bayes point [30]. The centre of mass is estimated by averaging over the trajectory of a billiard in version space. We derive bounds on the generalisation error of Bayesian classi ers in terms of the volume ratio of version space and parameter space. This ratio serves as an e ective VC dimension and greatly in uences generalisation. We present experimental results indicating that Bayes Point Machines consistently outperform Support Vector Machines. Moreover, we show theoretically and experimentally how Bayes Point Machines can easily be extended to admit training errors.

R. Herbrich

[1] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[2] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[3] John Shawe-Taylor,et al. Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[4] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.

[5] Nello Cristianini,et al. Bayesian Classifiers Are Large Margin Hyperplanes in a Hilbert Space , 1998, ICML.

[6] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[7] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[8] N. Cristianini,et al. Robust Bounds on Generalization from the Margin Distribution , 1998 .

[9] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[10] John Shawe-Taylor,et al. A PAC analysis of a Bayesian estimator , 1997, COLT '97.

[11] Radford M. Neal. Markov Chain Monte Carlo Methods Based on `Slicing' the Density Function , 1997 .