Kernel-Based Partial Permutation Test for Detecting Heterogeneous Functional Relationship

We propose a kernel-based partial permutation test for checking the equality of functional relationship between response and covariates among different groups. The main idea, which is intuitive and easy to implement, is to keep the projections of the response vector Y on leading principle components of a kernel matrix fixed and permute Y ’s projections on the remaining principle components. The proposed test allows for different choices of kernels, corresponding to different classes of functions under the null hypothesis. First, using linear or polynomial kernels, our partial permutation tests are exactly valid in finite samples for linear or polynomial regression models with Gaussian noise; similar results straightforwardly extend to kernels with finite feature spaces. Second, by allowing the kernel feature space to diverge with the sample size, the test can be large-sample valid for a wider class of functions. Third, for general kernels with possibly infinite-dimensional feature space, the partial permutation test is exactly valid when the covariates are exactly balanced across all groups, or asymptotically valid when the underlying function follows certain regularized Gaussian processes. We further suggest test statistics using likelihood ratio between two (nested) GPR models, and propose computationally efficient algorithms utilizing the EM algorithm and Newton’s method, where the latter also involves Fisher scoring and quadratic programming and is particularly useful when EM suffers from slow convergence. Extensions to correlated and non-Gaussian noises have also been investigated theoretically or numerically. Furthermore, the test can be extended to use multiple kernels together and can thus enjoy properties from each kernel. Both simulation study and application illustrate the properties of the proposed test.

[1]  R. Durrett Probability: Theory and Examples , 1993 .

[2]  D. Campbell,et al.  Regression-Discontinuity Analysis: An Alternative to the Ex-Post Facto Experiment , 1960 .

[3]  Xiao-Li Meng,et al.  Posterior Predictive $p$-Values , 1994 .

[4]  Martin J. Wainwright,et al.  Early stopping for non-parametric regression: An optimal data-dependent stopping rule , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  T. Hastie,et al.  Comment on "Support Vector Machines with Applications" , 2006, math/0612824.

[6]  Erniel B. Barrios,et al.  Bootstrap Methods , 2011, International Encyclopedia of Statistical Science.

[7]  Vittorio Murino,et al.  Active Regression with Adaptive Huber Loss , 2016, ArXiv.

[8]  Luke W. Miratrix,et al.  A nonparametric Bayesian methodology for regression discontinuity designs , 2017, Journal of Statistical Planning and Inference.

[9]  Joachim M. Buhmann,et al.  On Relevant Dimensions in Kernel Feature Spaces , 2008, J. Mach. Learn. Res..

[10]  Robert E. Kass,et al.  Hierarchical models for assessing variability among functions , 2005 .

[11]  T. Choi,et al.  Gaussian Process Regression Analysis for Functional Data , 2011 .

[12]  Wenxuan Zhong,et al.  Minimax Nonparametric Parallelism Test , 2019, J. Mach. Learn. Res..

[13]  Robert E Kass,et al.  Testing equality of two functions using BARS. , 2005, Statistics in medicine.

[14]  D. Freedman,et al.  Bootstrapping a Regression Equation: Some Empirical Results , 1984 .

[15]  Guang Cheng,et al.  Early Stopping for Nonparametric Testing , 2018, NeurIPS.

[16]  J. Hahn,et al.  IDENTIFICATION AND ESTIMATION OF TREATMENT EFFECTS WITH A REGRESSION-DISCONTINUITY DESIGN , 2001 .

[17]  Wenceslao González-Manteiga,et al.  Testing for the equality of k regression curves , 2007 .

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  Mikio L. Braun,et al.  Accurate Error Bounds for the Eigenvalues of the Kernel Matrix , 2006, J. Mach. Learn. Res..

[20]  Thomas Kühn,et al.  Eigenvalues of integral operators generated by positive definite Hölder continuous kernels on metric compacta , 1987 .

[21]  Ingo Steinwart,et al.  Consistency and robustness of kernel-based regression in convex risk minimization , 2007, 0709.0626.

[22]  Massimiliano Pontil,et al.  Support Vector Machines: Theory and Applications , 2001, Machine Learning and Its Applications.

[23]  Luke Miratrix,et al.  A Bayesian Nonparametric Approach to Geographic Regression Discontinuity Designs: Do School Districts Affect NYC House Prices? , 2018, 1807.04516.

[24]  Francesca Mangili,et al.  Gaussian Processes for Bayesian hypothesis tests on regression functions , 2015, AISTATS.

[25]  Guang Cheng,et al.  Local and global asymptotic inference in smoothing spline models , 2012, 1212.6788.

[26]  Yuesheng Xu,et al.  Universal Kernels , 2006, J. Mach. Learn. Res..

[27]  Holger Dette,et al.  Nonparametric comparison of regression curves: An empirical process approach , 2003 .

[28]  M. J. Bayarri,et al.  P Values for Composite Null Models , 2000 .

[29]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[30]  Zhiwei Xu,et al.  Testing for Parallelism Among Trends in Multiple Time Series , 2012, IEEE Transactions on Signal Processing.