Deep neural mapping support vector machines

The choice of kernel has an important effect on the performance of a support vector machine (SVM). The effect could be reduced by NEUROSVM, an architecture using multilayer perceptron for feature extraction and SVM for classification. In binary classification, a general linear kernel NEUROSVM can be theoretically simplified as an input layer, many hidden layers, and an SVM output layer. As a feature extractor, the sub-network composed of the input and hidden layers is first trained together with a virtual ordinary output layer by backpropagation, then with the output of its last hidden layer taken as input of the SVM classifier for further training separately. By taking the sub-network as a kernel mapping from the original input space into a feature space, we present a novel model, called deep neural mapping support vector machine (DNMSVM), from the viewpoint of deep learning. This model is also a new and general kernel learning method, where the kernel mapping is indeed an explicit function expressed as a sub-network, different from an implicit function induced by a kernel function traditionally. Moreover, we exploit a two-stage procedure of contrastive divergence learning and gradient descent for DNMSVM to jointly training an adaptive kernel mapping instead of a kernel function, without requirement of kernel tricks. As a whole of the sub-network and the SVM classifier, the joint training of DNMSVM is done by using gradient descent to optimize the objective function with the sub-network layer-wise pre-trained via contrastive divergence learning of restricted Boltzmann machines. Compared to the separate training of NEUROSVM, the joint training is a new algorithm for DNMSVM to have advantages over NEUROSVM. Experimental results show that DNMSVM can outperform NEUROSVM and RBFSVM (i.e., SVM with the kernel of radial basis function), demonstrating its effectiveness.

[1]  Suresh Venkatasubramanian,et al.  Continuous Kernel Learning , 2016, ECML/PKDD.

[2]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[3]  T. Lane,et al.  A Framework for Multiple Kernel Support Vector Regression and Its Applications to siRNA Efficacy Prediction , 2009, TCBB.

[4]  Ajith Abraham,et al.  An Empirical Comparison of Kernel Selection for Support Vector Machines , 2002, HIS.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[8]  James T. Kwok,et al.  Efficient Kernel Learning from Side Information Using ADMM , 2013, IJCAI.

[9]  Andrew Gordon Wilson,et al.  Fast Kernel Learning for Multidimensional Pattern Extrapolation , 2014, NIPS.

[10]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[11]  Pascal Vincent,et al.  A Neural Support Vector Network architecture with adaptive kernels , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[12]  Tu Bao Ho,et al.  Simple but effective methods for combining kernels in computational biology , 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies.

[13]  Fabio Aiolli,et al.  EasyMKL: a scalable multiple kernel learning algorithm , 2015, Neurocomputing.

[14]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[15]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[16]  Kristin P. Bennett,et al.  MARK: a boosting algorithm for heterogeneous kernel models , 2002, KDD.

[17]  Claus Bahlmann,et al.  Online handwriting recognition with support vector machines - a kernel approach , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[18]  Manik Varma,et al.  More generality in efficient multiple kernel learning , 2009, ICML '09.

[19]  Nico Pfeifer,et al.  Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery , 2015, Bioinform..

[20]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[21]  William Stafford Noble,et al.  Kernel methods for predicting protein-protein interactions , 2005, ISMB.

[22]  Nikhil R. Pal,et al.  NEUROSVM: An Architecture to Reduce the Effect of the Choice of Kernel on the Performance of SVM , 2009, J. Mach. Learn. Res..

[23]  Johan A. K. Suykens,et al.  Training multilayer perceptron classifiers based on a modified support vector method , 1999, IEEE Trans. Neural Networks.

[24]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[25]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Mehryar Mohri,et al.  Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..

[27]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[28]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[29]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[30]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[31]  N. Cristianini,et al.  Optimizing Kernel Alignment over Combinations of Kernel , 2002 .

[32]  Marijn F. Stollenga,et al.  The Neural Support Vector Machine , 2013 .

[33]  Massimiliano Pontil,et al.  Support Vector Machines for 3D Object Recognition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Muhammad Hussain,et al.  A Comparison of SVM Kernel Functions for Breast Cancer Detection , 2011, 2011 Eighth International Conference Computer Graphics, Imaging and Visualization.

[35]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.