Deep representation design from deep kernel networks

Abstract Deep kernel learning aims at designing nonlinear combinations of multiple standard elementary kernels by training deep networks. This scheme has proven to be effective, but intractable when handling large-scale datasets especially when the depth of the trained networks increases; indeed, the complexity of evaluating these networks scales quadratically w.r.t. the size of training data and linearly w.r.t. the depth of the trained networks. In this paper, we address the issue of efficient computation in Deep Kernel Networks (DKNs) by designing effective maps in the underlying Reproducing Kernel Hilbert Spaces (RKHS). Given a pretrained DKN, our method builds its associated Deep Map Network (DMN) whose inner product approximates the original network while being far more efficient. The design principle of our method is greedy and achieved layer-wise, by finding maps that approximate DKNs at different (input, intermediate and output) layers. This design also considers an extra fine-tuning step based on unsupervised learning, that further enhances the generalization ability of the trained DMNs. When plugged into SVMs, these DMNs turn out to be as accurate as the underlying DKNs while being at least an order of magnitude faster on large-scale datasets, as shown through extensive experiments on the challenging ImageCLEF, COREL5k benchmarks and the Banana dataset.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[3]  B. Caputo,et al.  Object categorization via local kernels , 2004, ICPR 2004.

[4]  Siwei Lyu,et al.  Mercer kernels for object recognition with local features , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Hichem Sahbi,et al.  Transductive Kernel Map Learning and Its Application Image Annotation , 2012, BMVC.

[6]  Hichem Sahbi,et al.  Deep kernel map networks for image annotation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Changming Zhu,et al.  Improved multi-kernel classification machine with Nyström approximation technique , 2015, Pattern Recognit..

[11]  Subhransu Maji,et al.  Automatic Image Annotation using Deep Learning Representations , 2015, ICMR.

[12]  Xiaojun Qi,et al.  Incorporating multiple SVMs for automatic image annotation , 2007, Pattern Recognit..

[13]  Mehryar Mohri,et al.  Two-Stage Learning Kernel Algorithms , 2010, ICML.

[14]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ivor W. Tsang,et al.  Two-Layer Multiple Kernel Learning , 2011, AISTATS.

[17]  Po-Sen Huang,et al.  Random features for Kernel Deep Convex Network , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Hichem Sahbi,et al.  Context-Based Support Vector Machines for Interconnected Image Annotation , 2010, ACCV.

[19]  Juan M. Corchado,et al.  Data-independent Random Projections from the feature-space of the homogeneous polynomial kernel , 2018, Pattern Recognit..

[20]  B. Thomee,et al.  Overview of the ImageCLEF 2013 Scalable Concept Image Annotation Subtask , 2013, CLEF.

[21]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[23]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[24]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[25]  Yong Dou,et al.  Multiple kernel learning with hybrid kernel alignment maximization , 2017, Pattern Recognit..

[26]  Hichem Sahbi,et al.  Semi supervised deep kernel design for image annotation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Md. Monirul Islam,et al.  A review on automatic image annotation techniques , 2012, Pattern Recognit..

[28]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[29]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[30]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[31]  Mehryar Mohri,et al.  Learning Non-Linear Combinations of Kernels , 2009, NIPS.

[32]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[33]  Hichem Sahbi,et al.  Context-Dependent Kernels for Object Classification , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Cordelia Schmid,et al.  Convolutional Kernel Networks , 2014, NIPS.

[35]  Cristian Sminchisescu,et al.  Random Fourier Approximations for Skewed Multiplicative Histogram Kernels , 2010, DAGM-Symposium.

[36]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[37]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[38]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[39]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[40]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[41]  Hichem Sahbi,et al.  Nonlinear Deep Kernel Learning for Image Annotation , 2017, IEEE Transactions on Image Processing.

[42]  Prasoon Goyal,et al.  Local Deep Kernel Learning for Efficient Non-linear SVM Prediction , 2013, ICML.

[43]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[44]  Hichem Sahbi ImageCLEF annotation with explicit context-aware kernel maps , 2015, International Journal of Multimedia Information Retrieval.

[45]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[47]  Yihong Gong,et al.  Deep Learning with Kernel Regularization for Visual Recognition , 2008, NIPS.

[48]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[49]  Alexandros Iosifidis,et al.  Nyström-based approximate kernel subspace learning , 2016, Pattern Recognit..