MKEL: Multiple Kernel Ensemble Learning via Unified Ensemble Loss for Image Classification

In this article, a novel ensemble model, called Multiple Kernel Ensemble Learning (MKEL), is developed by introducing a unified ensemble loss. Different from the previous multiple kernel learning (MKL) methods, which attempt to seek a linear combination of basis kernels as a unified kernel, our MKEL model aims to find multiple solutions in corresponding Reproducing Kernel Hilbert Spaces (RKHSs) simultaneously. To achieve this goal, multiple individual kernel losses are integrated into a unified ensemble loss. Therefore, each model can co-optimize to learn its optimal parameters by minimizing a unified ensemble loss in multiple RKHSs. Furthermore, we apply our proposed ensemble loss into the deep network paradigm and take the sub-network as a kernel mapping from the original input space into a feature space, named Deep-MKEL (D-MKEL). Our D-MKEL model can utilize the diversified deep individual sub-networks into a whole unified network to improve the classification performance. With this unified loss design, our D-MKEL model can make our network much wider than other traditional deep kernel networks and more parameters are learned and optimized. Experimental results on several mediate UCI classification and computer vision datasets demonstrate that our MKEL model can achieve the best classification performance among comparative MKL methods, such as Simple MKL, GMKL, Spicy MKL, and Matrix-Regularized MKL. On the contrary, experimental results on large-scale CIFAR-10 and SVHN datasets concretely show the advantages and potentialities of the proposed D-MKEL approach compared to state-of-the-art deep kernel methods.

[1]  Jian Yang,et al.  Selective Kernel Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Masashi Sugiyama,et al.  Dual-Augmented Lagrangian Method for Efficient Sparse Reconstruction , 2009, IEEE Signal Processing Letters.

[3]  Fabio Aiolli,et al.  EasyMKL: a scalable multiple kernel learning algorithm , 2015, Neurocomputing.

[4]  Zheng-Jun Zha,et al.  SLiKER: Sparse loss induced kernel ensemble regression , 2021, Pattern Recognit..

[5]  Lei Guo,et al.  When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[6]  Jon Atli Benediktsson,et al.  Nonlinear Multiple Kernel Learning With Multiple-Structure-Element Extended Morphological Profiles for Hyperspectral Image Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[7]  Taiji Suzuki,et al.  SpicyMKL: a fast algorithm for Multiple Kernel Learning with thousands of kernels , 2011, Machine Learning.

[8]  Qionghai Dai,et al.  Explaining the Genetic Causality for Complex Phenotype via Deep Association Kernel Learning , 2020, Patterns.

[9]  Zheng-Jun Zha,et al.  Robust Deep Co-Saliency Detection With Group Semantic and Pyramid Attention , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Xiaoqiang Lu,et al.  Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.

[11]  Patrick J. F. Groenen,et al.  GenSVM: A Generalized Multiclass Support Vector Machine , 2016, J. Mach. Learn. Res..

[12]  John Shawe-Taylor,et al.  A Note on Improved Loss Bounds for Multiple Kernel Learning , 2011, ArXiv.

[13]  Tat-Seng Chua,et al.  Mining Travel Patterns from GPS-Tagged Photos , 2011, MMM.

[14]  Xuelong Li,et al.  Scene Classification With Recurrent Attention of VHR Remote Sensing Images , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[15]  Michael I. Jordan,et al.  Computing regularization paths for learning multiple kernels , 2004, NIPS.

[16]  Alexander Binder,et al.  Theory and Algorithms for the Localized Setting of Learning Kernels , 2015, FE@NIPS.

[17]  Shyam Visweswaran,et al.  Deep Multiple Kernel Learning , 2013, 2013 12th International Conference on Machine Learning and Applications.

[18]  Xuelong Li,et al.  Locality and Structure Regularized Low Rank Representation for Hyperspectral Image Classification , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[19]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[20]  Mikhail Belkin,et al.  Laplacian Support Vector Machines Trained in the Primal , 2009, J. Mach. Learn. Res..

[21]  Yixin Yang,et al.  Localized Multiple Kernel Learning With Dynamical Clustering and Matrix Regularization , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Lei Zhang,et al.  Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  S. V. N. Vishwanathan,et al.  SPF-GMKL: generalized multiple kernel learning with a million kernels , 2012, KDD.

[24]  Hao Wang,et al.  Multi-scale structural kernel representation for object detection , 2021, Pattern Recognit..

[25]  Hongwei Sun,et al.  Mercer theorem for RKHS on noncompact sets , 2005, J. Complex..

[26]  Jie Xu,et al.  Multi-Class Support Vector Machine via Maximizing Multi-Class Margins , 2017, IJCAI.

[27]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[28]  Claudio Gallicchio,et al.  Enhancing deep neural networks via multiple kernel learning , 2020, Pattern Recognit..

[29]  Ivor W. Tsang,et al.  Two-Layer Multiple Kernel Learning , 2011, AISTATS.

[30]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Li Bai,et al.  Cosine Similarity Metric Learning for Face Verification , 2010, ACCV.

[32]  Jianping Fan,et al.  A generalized least-squares approach regularized with graph embedding for dimensionality reduction , 2020, Pattern Recognit..

[33]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[34]  Ganesh Ramakrishnan,et al.  Efficient Rule Ensemble Learning using Hierarchical Kernels , 2011, ICML.

[35]  Alexander J. Smola,et al.  Guest editorial: model selection and optimization in machine learning , 2011, Machine Learning.

[36]  Yixin Yang,et al.  Matrix-Regularized Multiple Kernel Learning via $(r,~p)$ Norms , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[38]  Tri Dao,et al.  A Kernel Theory of Modern Data Augmentation , 2018, ICML.

[39]  Yujian Li,et al.  Deep neural mapping support vector machines , 2017, Neural Networks.

[40]  Xiao-Yuan Jing,et al.  Heterogeneous Defect Prediction Through Multiple Kernel Learning and Ensemble Learning , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[41]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[42]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[43]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[44]  Steven C. H. Hoi,et al.  MKBoost: A Framework of Multiple Kernel Boosting , 2013, IEEE Trans. Knowl. Data Eng..

[45]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[46]  Hao Wang,et al.  Multi-scale Location-Aware Kernel Representation for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Yang Yang,et al.  Robust (Semi) Nonnegative Graph Embedding , 2014, IEEE Transactions on Image Processing.

[48]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[49]  Peter L. Bartlett,et al.  A Unifying View of Multiple Kernel Learning , 2010, ECML/PKDD.

[50]  John Shawe-Taylor,et al.  Improved Loss Bounds For Multiple Kernel Learning , 2011, AISTATS.

[51]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[52]  Fang Liu,et al.  Selective multiple kernel learning for classification with ensemble strategy , 2013, Pattern Recognit..

[53]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[54]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[55]  Alexander Zien,et al.  lp-Norm Multiple Kernel Learning , 2011, J. Mach. Learn. Res..

[56]  Yichuan Tang,et al.  Deep Learning using Linear Support Vector Machines , 2013, 1306.0239.

[57]  Yiming Yang,et al.  Implicit Kernel Learning , 2019, AISTATS.

[58]  Rama Chellappa,et al.  Multiple Kernel Learning for Sparse Representation-Based Classification , 2014, IEEE Transactions on Image Processing.