Hierarchic Bayesian models for kernel learning

The integration of diverse forms of informative data by learning an optimal combination of base kernels in classification or regression problems can provide enhanced performance when compared to that obtained from any single data source. We present a Bayesian hierarchical model which enables kernel learning and present effective variational Bayes estimators for regression and classification. Illustrative experiments demonstrate the utility of the proposed method. Matlab code replicating results reported is available at http://www.dcs.gla.ac.uk/~srogers/kernel_comb.html.

[1]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[2]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[3]  Zhihua Zhang,et al.  Bayesian inference for transductive learning of kernel matrix using the Tanner-Wong data augmentation algorithm , 2004, ICML.

[4]  Olivier Bousquet,et al.  On the Complexity of Learning the Kernel Matrix , 2002, NIPS.

[5]  Michael I. Jordan,et al.  Variational methods for inference and estimation in graphical models , 1997 .

[6]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[7]  Murat Dundar,et al.  A fast iterative algorithm for fisher discriminant using heterogeneous kernels , 2004, ICML.

[8]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[9]  Ole Winther,et al.  Independent component analysis for understanding multimedia content , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[10]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[11]  Alexander J. Smola,et al.  Hyperkernels , 2002, NIPS.

[12]  John D. Lafferty,et al.  Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..

[13]  Ivor W. Tsang,et al.  Efficient hyperkernel learning using second-order cone programming , 2004, IEEE Transactions on Neural Networks.

[14]  Steve R. Gunn,et al.  Structural Modelling with Sparse Kernels , 2002, Machine Learning.

[15]  M. West On scale mixtures of normal distributions , 1987 .

[16]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[17]  Neil D. Lawrence,et al.  Reducing the variability in cDNA microarray image processing by Bayesian inference , 2004, Bioinform..

[18]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[19]  Michael I. Jordan,et al.  Computing regularization paths for learning multiple kernels , 2004, NIPS.

[20]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[21]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[22]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[23]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[24]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .