Generalized Multiple Kernel Learning With Data-Dependent Priors

Multiple kernel learning (MKL) and classifier ensemble are two mainstream methods for solving learning problems in which some sets of features/views are more informative than others, or the features/views within a given set are inconsistent. In this paper, we first present a novel probabilistic interpretation of MKL such that maximum entropy discrimination with a noninformative prior over multiple views is equivalent to the formulation of MKL. Instead of using the noninformative prior, we introduce a novel data-dependent prior based on an ensemble of kernel predictors, which enhances the prediction performance of MKL by leveraging the merits of the classifier ensemble. With the proposed probabilistic framework of MKL, we propose a hierarchical Bayesian model to learn the proposed data-dependent prior and classification model simultaneously. The resultant problem is convex and other information (e.g., instances with either missing views or missing labels) can be seamlessly incorporated into the data-dependent priors. Furthermore, a variety of existing MKL models can be recovered under the proposed MKL framework and can be readily extended to incorporate these priors. Extensive experiments demonstrate the benefits of our proposed framework in supervised and semisupervised settings, as well as in tasks with partial correspondence among multiple views.

[1]  Theodoros Damoulas,et al.  Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection , 2008, Bioinform..

[2]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[3]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Ethem Alpaydin,et al.  Cost-conscious multiple kernel learning , 2010, Pattern Recognit. Lett..

[6]  Ivor W. Tsang,et al.  Efficient Multitemplate Learning for Structured Prediction , 2011, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Mikhail Belkin,et al.  Laplacian Support Vector Machines Trained in the Primal , 2009, J. Mach. Learn. Res..

[8]  Aapo Hyvärinen,et al.  Sparse Code Shrinkage: Denoising of Nongaussian Data by Maximum Likelihood Estimation , 1999, Neural Computation.

[9]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[10]  R. Bharat Rao,et al.  Bayesian Co-Training , 2007, J. Mach. Learn. Res..

[11]  Mehryar Mohri,et al.  Ensembles of Kernel Predictors , 2011, UAI.

[12]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[13]  Adrian S. Lewis,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[14]  P. Bartlett,et al.  ` p-Norm Multiple Kernel Learning , 2008 .

[15]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[16]  Simon Rogers,et al.  Hierarchic Bayesian models for kernel learning , 2005, ICML.

[17]  Zenglin Xu,et al.  Efficient Sparse Generalized Multiple Kernel Learning , 2011, IEEE Transactions on Neural Networks.

[18]  Venu Govindaraju,et al.  Review of Classifier Combination Methods , 2008, Machine Learning in Document Analysis and Recognition.

[19]  Trevor Darrell,et al.  Bayesian Localized Multiple Kernel Learning , 2009 .

[20]  Jun Zhu,et al.  Maximum Entropy Discrimination Markov Networks , 2009, J. Mach. Learn. Res..

[21]  Martin Ester,et al.  Sequence analysis PSORTb v . 2 . 0 : Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis , 2004 .

[22]  Tobias Scheffer,et al.  Learning With Multiple Views , 2005 .

[23]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[24]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[25]  Alex Pentland,et al.  Discriminative, generative and imitative learning , 2002 .

[26]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[27]  Michael J. Watts,et al.  IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Publication Information , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[29]  Bernhard Schölkopf,et al.  Fast protein classification with multiple networks , 2005, ECCB/JBI.

[30]  Qi Tian,et al.  ${\rm S}^{3}{\rm MKL}$: Scalable Semi-Supervised Multiple Kernel Learning for Real-World Image Applications , 2012, IEEE Transactions on Multimedia.

[31]  Ivor W. Tsang,et al.  Parameter-Free Spectral Kernel Learning , 2010, UAI.

[32]  M. Kloft,et al.  Efficient and Accurate ` p-Norm Multiple Kernel Learning , 2009 .

[33]  Rama Chellappa,et al.  Kernel Learning for Extrinsic Classification of Manifold Features , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[35]  Sham M. Kakade,et al.  An Information Theoretic Framework for Multi-view Learning , 2008, COLT.

[36]  William Stafford Noble,et al.  Nonstationary kernel combination , 2006, ICML.

[37]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[38]  Ivor W. Tsang,et al.  This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Soft Margin Multiple Kernel Learning , 2022 .

[39]  S. V. N. Vishwanathan,et al.  Multiple Kernel Learning and the SMO Algorithm , 2010, NIPS.

[40]  Yuncai Liu,et al.  Traffic Incident Detection Using Multiple-Kernel Support Vector Machine , 2012 .

[41]  Sheng Tang,et al.  Localized Multiple Kernel Learning for Realistic Human Action Recognition in Videos , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[43]  Mehmet Gönen,et al.  Bayesian Efficient Multiple Kernel Learning , 2012, ICML.

[44]  Qi Tian,et al.  S3MKL: scalable semi-supervised multiple kernel learning for image data mining , 2010, ACM Multimedia.

[45]  Mehryar Mohri,et al.  L2 Regularization for Learning Kernels , 2009, UAI.

[46]  Cheng Soon Ong,et al.  An Automated Combination of Kernels for Predicting Protein Subcellular Localization , 2007, WABI.

[47]  Massih-Reza Amini,et al.  Learning from Multiple Partially Observed Views - an Application to Multilingual Text Categorization , 2009, NIPS.

[48]  Zenglin Xu,et al.  Non-monotonic feature selection , 2009, ICML '09.

[49]  Jun Huan,et al.  CoNet: feature generation for multi-view semi-supervised learning with partially observed views , 2012, CIKM.

[50]  Daphne Koller,et al.  Active Classification based on Value of Classifier , 2011, NIPS.

[51]  Zenglin Xu,et al.  Smooth Optimization for Effective Multiple Kernel Learning , 2010, AAAI.

[52]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[53]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[54]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[55]  Taiji Suzuki,et al.  Regularization Strategies and Empirical Bayesian Learning for MKL , 2010, ArXiv.