Neural-Kernelized Conditional Density Estimation

Conditional density estimation is a general framework for solving various problems in machine learning. Among existing methods, non-parametric and/or kernelbased methods are often difficult to use on large datasets, while methods based on neural networks usually make restrictive parametric assumptions on the probability densities. Here, we propose a novel method for estimating the conditional density based on score matching. In contrast to existing methods, we employ scalable neural networks, but do not make explicit parametric assumptions on densities. The key challenge in applying score matching to neural networks is computation of the firstand second-order derivatives of a model for the log-density. We tackle this challenge by developing a new neural-kernelized approach, which can be applied on large datasets with stochastic gradient descent, while the reproducing kernels allow for easy computation of the derivatives needed in score matching. We show that the neural-kernelized function approximator has universal approximation capability and that our method is consistent in conditional density estimation. We numerically demonstrate that our method is useful in high-dimensional conditional density estimation, and compares favourably with existing methods. Finally, we prove that the proposed method has interesting connections to two probabilistically principled frameworks of representation learning: Nonlinear sufficient dimension reduction and nonlinear independent component analysis.

[1]  Arthur Gretton,et al.  Efficient and principled score estimation with Nyström kernel exponential families , 2017, AISTATS.

[2]  Aapo Hyvärinen,et al.  Nonlinear ICA of Temporally Dependent Stationary Sources , 2017, AISTATS.

[3]  Aapo Hyvärinen,et al.  Density Estimation in Infinite Dimensional Exponential Families , 2013, J. Mach. Learn. Res..

[4]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[5]  Lorenzo Rosasco,et al.  Less is More: Nyström Computational Regularization , 2015, NIPS.

[6]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[7]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[8]  Aapo Hyvärinen,et al.  Clustering via Mode Seeking by Direct Estimation of the Gradient of a Log-Density , 2014, ECML/PKDD.

[9]  Ruslan Salakhutdinov,et al.  Learning Stochastic Feedforward Neural Networks , 2013, NIPS.

[10]  Bing Li,et al.  A general theory for nonlinear sufficient dimension reduction: Formulation and estimation , 2013, 1304.0580.

[11]  Runze Li,et al.  Local modal regression , 2012, Journal of nonparametric statistics.

[12]  Ilya Sutskever,et al.  Estimating the Hessian by Back-propagating Curvature , 2012, ICML.

[13]  Tapio Salakoski,et al.  A Kernel-Based Framework for Learning Graded Relations From Data , 2011, IEEE Transactions on Fuzzy Systems.

[14]  Takafumi Kanamori,et al.  Least-Squares Conditional Density Estimation , 2010, IEICE Trans. Inf. Syst..

[15]  Su-Yun Huang,et al.  Nonlinear Dimension Reduction with Kernel Sliced Inverse Regression , 2009, IEEE Transactions on Knowledge and Data Engineering.

[16]  Ding-Xuan Zhou Derivative reproducing properties for kernel methods in learning theory , 2008 .

[17]  Han-Ming Wu Kernel Sliced Inverse Regression with Applications to Classification , 2008 .

[18]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[19]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[20]  Miguel Á. Carreira-Perpiñán,et al.  Reconstruction of Sequential Data with Probabilistic Models and Continuity Constraints , 1999, NIPS.

[21]  Jianqing Fan,et al.  Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems , 1996 .

[22]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[23]  WU Qiang,et al.  Regularized sliced inverse regression for kernel models , 2022 .