A multi-scale kernel learning method and its application in image classification

Abstract The success of support vector machine depends on the kernel function, which directly affects the performance of SVM. Therefore, to improve the generalization of SVM, we will study the selection of kernel function. The multi-scale kernel method is one particular type of multiple kernel method which combines multi-scale kernels through a multi-kernel learning framework. It has the capability of generalizing not only the scattered region of a training set very well but also generalizing the dense region of data sets very well. Inspired by the advantages of the multi-scale kernel learning method, we applied kernel centered polarization to construct an optimization problem which was used to learn the multi scale kernel function and select the optimal parameters. A thorough analysis and proofs are provided. Experimental results show that the proposed kernel learning method and algorithm are reasonable and effective and have very good generalization performance.

[1]  Lei Wang,et al.  Feature Selection with Kernel Class Separability , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Tu Bao Ho,et al.  An efficient kernel matrix evaluation measure , 2008, Pattern Recognit..

[3]  Bin Li,et al.  Exploiting multi-scale support vector regression for image compression , 2007, Neurocomputing.

[4]  Jiaxin Wang,et al.  Non-flat function estimation with a multi-scale support vector regression , 2006, Neurocomputing.

[5]  M. Omair Ahmad,et al.  Optimizing the kernel in the empirical feature space , 2005, IEEE Transactions on Neural Networks.

[6]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[7]  Bernhard Schölkopf,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[8]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[9]  Yoram Baram,et al.  Learning by Kernel Polarization , 2005, Neural Computation.

[10]  Jinbo Bi,et al.  Column-generation boosting methods for mixture of kernels , 2004, KDD.

[11]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Alexander J. Smola,et al.  Learning the Kernel with Hyperkernels , 2005, J. Mach. Learn. Res..

[15]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[16]  Pierre Dupont,et al.  Kernel methods for heterogeneous feature selection , 2015, Neurocomputing.

[17]  Alicia Troncoso Lora,et al.  A multi-scale smoothing kernel for measuring time-series similarity , 2015, Neurocomputing.

[18]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Jian Wang,et al.  Multi-scale Support Vector Machine for Regression Estimation , 2006, ISNN.

[20]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[21]  Huan Liu,et al.  Sample-screening MKL method via boosting strategy for hyperspectral image classification , 2016, Neurocomputing.

[22]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[23]  Paul Honeine,et al.  Online Kernel Principal Component Analysis: A Reduced-Order Model , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[25]  Houkuan Huang,et al.  Learning by local kernel polarization , 2009, Neurocomputing.

[26]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[27]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[28]  Jian Yang,et al.  Higher-level feature combination via multiple kernel learning for image classification , 2015, Neurocomputing.

[29]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[30]  Mehryar Mohri,et al.  Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..

[31]  Wenjian Wang,et al.  An efficient Gaussian kernel optimization based on centered kernel polarization criterion , 2015, Inf. Sci..

[32]  Alexei Pozdnoukhov,et al.  Multi-scale support vector algorithms for hot spot detection and modelling , 2008 .