Deep Networks with Adaptive Nyström Approximation

Recent work has focused on combining kernel methods and deep learning to exploit the best of the two approaches. Here, we introduce a new architecture of neural networks in which we replace the top dense layers of standard convolutional architectures with an approximation of a kernel function by relying on the Nyström approximation. Our approach is easy and highly flexible. It is compatible with any kernel function and it allows exploiting multiple kernels. We show that our architecture has the same performance than standard architecture on datasets like SVHN and CIFAR100. One benefit of the method lies in its limited number of learnable parameters which makes it particularly suited for small training set sizes, e.g. from 5 to 20 samples per class.

[1]  Roberto Basili,et al.  Explaining non-linear Classifier Decisions within Kernel-based Deep Architectures , 2018, BlackboxNLP@EMNLP.

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[4]  Tommi S. Jaakkola,et al.  Steps Toward Deep Kernel Methods from Infinite Neural Networks , 2015, ArXiv.

[5]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Nico Schmid,et al.  Learning with Hierarchical Gaussian Kernels , 2016, ArXiv.

[7]  Julien Mairal,et al.  End-to-End Kernel Learning with Supervised Convolutional Kernel Networks , 2016, NIPS.

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[10]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[11]  Cordelia Schmid,et al.  Convolutional Kernel Networks , 2014, NIPS.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[14]  Alexander J. Smola,et al.  Fastfood - Computing Hilbert Space Expansions in loglinear time , 2013, ICML.

[15]  Jianxin Li,et al.  Stacked Kernel Network , 2017, ArXiv.

[16]  Le Song,et al.  Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[18]  Klaus-Robert Müller,et al.  Kernel Analysis of Deep Networks , 2011, J. Mach. Learn. Res..

[19]  Shiliang Sun,et al.  A review of Nyström methods for large-scale machine learning , 2015, Inf. Fusion.

[20]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[21]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[22]  Andreas Spanias,et al.  Optimizing Kernel Machines Using Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[24]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[25]  Prasoon Goyal,et al.  Local Deep Kernel Learning for Efficient Non-linear SVM Prediction , 2013, ICML.

[26]  Gunnar Rätsch,et al.  Input space versus feature space in kernel-based methods , 1999, IEEE Trans. Neural Networks.

[27]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[28]  Elad Eban,et al.  Improper Deep Kernels , 2016, AISTATS.

[29]  Ambedkar Dukkipati,et al.  Learning by Stretching Deep Networks , 2014, ICML.