Robust Hypothesis Test for Nonlinear Effect with Gaussian Processes

This work constructs a hypothesis test for detecting whether an data-generating function $h: R^p \rightarrow R$ belongs to a specific reproducing kernel Hilbert space $\mathcal{H}_0$ , where the structure of $\mathcal{H}_0$ is only partially known. Utilizing the theory of reproducing kernels, we reduce this hypothesis to a simple one-sided score test for a scalar parameter, develop a testing procedure that is robust against the mis-specification of kernel functions, and also propose an ensemble-based estimator for the null model to guarantee test performance in small samples. To demonstrate the utility of the proposed method, we apply our test to the problem of detecting nonlinear interaction between groups of continuous features. We evaluate the finite-sample performance of our test under different data-generating functions and estimation strategies for the null model. Our results reveal interesting connections between notions in machine learning (model underfit/overfit) and those in statistical inference (i.e. Type I error/power of hypothesis test), and also highlight unexpected consequences of common model estimating strategies (e.g. estimating kernel hyperparameters using maximum likelihood estimation) on model inference.

[1]  Xihong Lin,et al.  Semiparametric Regression of Multidimensional Genetic Pathway Data: Least‐Squares Kernel Machines and Linear Mixed Models , 2007, Biometrics.

[2]  Chong Gu Smoothing Spline Anova Models , 2002 .

[3]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  D. Harville Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems , 1977 .

[6]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[7]  Xihong Lin Variance component testing in generalised linear models with random effects , 1997 .

[8]  Massimiliano Pontil,et al.  Leave One Out Error, Stability, and Generalization of Voting Combinations of Classifiers , 2004, Machine Learning.

[9]  Milan Lukić,et al.  Stochastic processes with sample paths in reproducing kernel Hilbert spaces , 2001 .

[10]  Mehryar Mohri,et al.  Ensembles of Kernel Predictors , 2011, UAI.

[11]  Niall M. Adams,et al.  A comparison of efficient approximations for a weighted sum of chi-squared random variables , 2016, Stat. Comput..

[12]  Zoubin Ghahramani,et al.  The Random Forest Kernel and other kernels for big data from random partitions , 2014, ArXiv.

[13]  Tomaso A. Poggio,et al.  Bounds on the Generalization Performance of Kernel Machine Ensembles , 2000, ICML.

[14]  M. Pontil Leave-one-out error and stability of learning algorithms with applications , 2002 .

[15]  Mehryar Mohri,et al.  Two-Stage Learning Kernel Algorithms , 2010, ICML.

[16]  G. Wahba Spline models for observational data , 1990 .

[17]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[18]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[19]  Xihong Lin,et al.  Hypothesis testing in semiparametric additive mixed models. , 2003, Biostatistics.

[20]  Xihong Lin,et al.  Powerful Tests for Detecting a Gene Effect in the Presence of Possible Gene–Gene Interactions Using Garrote Kernel Machines , 2011, Biometrics.