论文信息 - Differentiable Compositional Kernel Learning for Gaussian Processes

Differentiable Compositional Kernel Learning for Gaussian Processes

The generalization properties of Gaussian processes depend heavily on the choice of kernel, and this choice remains a dark art. We present the Neural Kernel Network (NKN), a flexible family of kernels represented by a neural network. The NKN architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel. It can compactly approximate compositional kernel structures such as those used by the Automatic Statistician (Lloyd et al., 2014), but because the architecture is differentiable, it is end-to-end trainable with gradient-based optimization. We show that the NKN is universal for the class of stationary kernels. Empirically we demonstrate pattern discovery and extrapolation abilities of NKN on several tasks that depend crucially on identifying the underlying structure, including time series and texture extrapolation, as well as Bayesian optimization.

[1] Y. Kakihara. A note on harmonizable and V-bounded processes , 1985 .

[2] Andrew Gordon Wilson,et al. Fast Kernel Learning for Multidimensional Pattern Extrapolation , 2014, NIPS.

[3] Imre Csiszár,et al. Information Theory - Coding Theorems for Discrete Memoryless Systems, Second Edition , 2011 .

[4] Pedro M. Domingos,et al. Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[5] Roman Garnett,et al. Bayesian optimization for automated model selection , 2016, NIPS.

[6] Zi Wang,et al. Batched High-dimensional Bayesian Optimization via Structural Kernel Learning , 2017, ICML.

[7] A. E. Ingham. On Tauberian Theorems , 1965 .

[8] Andrew Gordon Wilson,et al. Deep Kernel Learning , 2015, AISTATS.

[9] Joshua B. Tenenbaum,et al. Automatic Construction and Natural-Language Description of Nonparametric Regression Models , 2014, AAAI.

[10] Francis R. Bach,et al. Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning , 2008, NIPS.

[11] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[12] Roman Garnett,et al. Discovering and Exploiting Additive Structure for Bayesian Optimization , 2017, AISTATS.

[13] Carl E. Rasmussen,et al. Sparse Spectrum Gaussian Process Regression , 2010, J. Mach. Learn. Res..

[14] Geoffrey E. Hinton,et al. Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.

[15] Joshua B. Tenenbaum,et al. Structure Discovery in Nonparametric Regression through Compositional Kernel Search , 2013, ICML.

[16] Andrew Gordon Wilson,et al. Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.