Implicit Kernel Learning

Kernels are powerful and versatile tools in machine learning and statistics. Although the notion of universal kernels and characteristic kernels has been studied, kernel selection still greatly influences the empirical performance. While learning the kernel in a data driven way has been investigated, in this paper we explore learning the spectral distribution of kernel via implicit generative models parametrized by deep neural networks. We called our method Implicit Kernel Learning (IKL). The proposed framework is simple to train and inference is performed via sampling random Fourier features. We investigate two applications of the proposed IKL as examples, including generative adversarial networks with MMD (MMD GAN) and standard supervised learning. Empirically, MMD GAN with IKL outperforms vanilla predefined kernels on both image and text generation benchmarks; using IKL with Random Kitchen Sinks also leads to substantial improvement over existing state-of-the-art kernel learning algorithms on popular supervised learning benchmarks. Theory and conditions for using IKL in both applications are also studied as well as connections to previous state-of-the-art methods.

[1]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[2]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[3]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[4]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[5]  Tom Sercu,et al.  Fisher GAN , 2017, NIPS.

[6]  W. Rudin,et al.  Fourier Analysis on Groups. , 1965 .

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Klaus-Robert Müller,et al.  An Empirical Study on The Properties of Random Bases for Kernel Methods , 2017, NIPS.

[9]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[10]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[11]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[12]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[13]  Yiming Yang,et al.  Data-driven Random Fourier Features using Stein Effect , 2017, IJCAI.

[14]  Julien Mairal,et al.  End-to-End Kernel Learning with Supervised Convolutional Kernel Networks , 2016, NIPS.

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  Andrew Gordon Wilson,et al.  Learning Scalable Deep Kernels with Recurrent Structure , 2016, J. Mach. Learn. Res..

[17]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[18]  O. F. Borisenko,et al.  Directional derivatives of the maximum function , 1992 .

[19]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[20]  Cordelia Schmid,et al.  Convolutional Kernel Networks , 2014, NIPS.

[21]  Yiming Yang,et al.  Kernel Change-point Detection with Auxiliary Deep Generative Models , 2019, ICLR.

[22]  Barnabás Póczos,et al.  Bayesian Nonparametric Kernel-Learning , 2015, AISTATS.

[23]  Joshua B. Tenenbaum,et al.  Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[24]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[25]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[26]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[27]  Arthur Gretton,et al.  On gradient regularizers for MMD GANs , 2018, NeurIPS.

[28]  John C. Duchi,et al.  Learning Kernels with Random Features , 2016, NIPS.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[31]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[32]  Mehryar Mohri,et al.  Generalization Bounds for Learning Kernels , 2010, ICML.

[33]  Marc G. Bellemare,et al.  The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.

[34]  Le Song,et al.  A la Carte - Learning Fast Kernels , 2014, AISTATS.

[35]  Chun-Liang Li,et al.  Utilize Old Coordinates: Faster Doubly Stochastic Gradients for Kernel Methods , 2016, UAI.

[36]  Geoffrey E. Hinton,et al.  Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes , 2007, NIPS.

[37]  Stefano Ermon,et al.  Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance , 2018, NeurIPS.

[38]  Marius Kloft,et al.  Learning Kernels Using Local Rademacher Complexity , 2013, NIPS.

[39]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[40]  Vaibhava Goel,et al.  McGan: Mean and Covariance Feature Matching GAN , 2017, ICML.

[41]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[42]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[43]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[44]  Andrew Gordon Wilson,et al.  Bayesian GAN , 2017, NIPS.

[45]  Francis R. Bach,et al.  Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[46]  Le Song,et al.  Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.

[47]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[48]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[49]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[50]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[51]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[52]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[53]  Yu Cheng,et al.  Sobolev GAN , 2017, ICLR.

[54]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[55]  Cristian Sminchisescu,et al.  Fourier Kernel Learning , 2012, ECCV.