A Kernel Log-Rank Test of Independence for Right-Censored Data

We introduce a general non-parametric independence test between right-censored survival times and covariates, which may be multivariate. Our test statistic has a dual interpretation, first in terms of the supremum of a potentially infinite collection of weight-indexed log-rank tests, with weight functions belonging to a reproducing kernel Hilbert space (RKHS) of functions; and second, as the norm of the difference of embeddings of certain finite measures into the RKHS, similar to the Hilbert-Schmidt Independence Criterion (HSIC) test-statistic. We study the asymptotic properties of the test, finding sufficient conditions to ensure our test correctly rejects the null hypothesis under any alternative. The test statistic can be computed straightforwardly, and the rejection threshold is obtained via an asymptotically consistent Wild Bootstrap procedure. Extensive simulations demonstrate that our testing procedure generally performs better than competing approaches in detecting complex non-linear dependence.

[1]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[2]  N. Mantel Evaluation of survival data and two new rank order statistics arising in its consideration. , 1966, Cancer chemotherapy reports.

[3]  D. Cox Regression Models and Life-Tables , 1972 .

[4]  J. Peto,et al.  Asymptotically Efficient Rank Invariant Test Procedures , 1972 .

[5]  James H. Ware,et al.  On distribution-free tests for equality of survival distributions , 1977 .

[6]  Odd Aalen,et al.  Nonparametric Estimation of Partial Transition Probabilities in Multiple Decrement Models , 1978 .

[7]  R. Gill Censoring and stochastic integrals , 1980 .

[8]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[9]  R. Gill,et al.  Cox's regression model for counting processes: a large sample study : (preprint) , 1982 .

[10]  R. Tarone,et al.  On the distribution of the maximum of the longrank statistic and the modified Wilcoxon statistic , 1981 .

[11]  R. Gill Large Sample Behaviour of the Product-Limit Estimator on the Whole Line , 1983 .

[12]  D. Harrington A class of rank test procedures for censored survival data , 1982 .

[13]  T. Fleming,et al.  Surgical adjuvant therapy of large-bowel carcinoma: an evaluation of levamisole and the combination of levamisole and fluorouracil. The North Central Cancer Treatment Group and the Mayo Clinic. , 1989, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[14]  A. Karr,et al.  Nonparametric Survival Analysis with Time-Dependent Covariate Effects: A Penalized Partial Likelihood Approach , 1990 .

[15]  Bradley Efron,et al.  FISHER'S INFORMATION IN TERMS OF THE HAZARD RATE' , 1990 .

[16]  D. Harrington,et al.  Counting Processes and Survival Analysis , 1991 .

[17]  Robert Gray,et al.  Flexible Methods for Analyzing Survival Data Using Splines, with Applications to Breast Cancer Prognosis , 1992 .

[18]  Niels Keiding,et al.  Statistical Models Based on Counting Processes , 1993 .

[19]  A. Feuerverger,et al.  A Consistent Test for Bivariate Dependence , 1993 .

[20]  Association between survival time and ordinal covariates. , 1994, Biometrics.

[21]  H. Dehling,et al.  Random quadratic forms and the bootstrap for U -statistics , 1994 .

[22]  C. Tangen,et al.  Fluorouracil plus Levamisole as Effective Adjuvant Therapy after Resection of Stage III Colon Carcinoma: A Final Report , 1995, Annals of Internal Medicine.

[23]  I. McKeague,et al.  An Omnibus Test for Independence of a Survival Time from a Covariate , 1995 .

[24]  D. Denk,et al.  Videoendoscopic biofeedback: a simple method to improve the efficacy of swallowing rehabilitation of patients after head and neck surgery. , 1997, ORL; journal for oto-rhino-laryngology and its related specialties.

[25]  Michael R. Kosorok,et al.  The Versatility of Function-Indexed Weighted Log-Rank Statistics , 1999 .

[26]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[27]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[28]  Choon Hui Teo,et al.  Fast and space efficient string kernels using suffix arrays , 2006, ICML.

[29]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[30]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[31]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[32]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[33]  Bernhard Schölkopf,et al.  Characteristic Kernels on Groups and Semigroups , 2008, NIPS.

[34]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[35]  Wolfgang Kössler,et al.  Max-type rank tests, U-tests, and adaptive tests for the two-sample location problem - An asymptotic power study , 2010, Comput. Stat. Data Anal..

[36]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[37]  Vilijandas Bagdonaviecius,et al.  Non-Parametric Tests for Censored Data , 2011 .

[38]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[39]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[40]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[41]  Arthur Gretton,et al.  A Kernel Test for Three-Variable Interactions , 2013, NIPS.

[42]  Arthur Gretton,et al.  A Kernel Independence Test for Random Processes , 2014, ICML.

[43]  A. Janssen,et al.  Weighted Logrank Permutation Tests for Randomly Right Censored Life Science Data , 2014 .

[44]  A. Gretton A simpler condition for consistency of a kernel independence test , 2015, 1501.06103.

[45]  Jean-François Dupuy,et al.  An omnibus test for several hazard alternatives in prevention randomized controlled clinical trials , 2015, Statistics in medicine.

[46]  Katharina Burger,et al.  Counting Processes And Survival Analysis , 2016 .

[47]  B. Schölkopf,et al.  Kernel‐based tests for joint independence , 2016, 1603.00285.

[48]  Alexander J. Smola,et al.  Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[49]  Arthur Gretton,et al.  Large-scale kernel methods for independence testing , 2016, Statistics and Computing.

[50]  Georg Heinze,et al.  Weighted Cox Regression Using the R Package coxphw , 2018 .

[51]  Jean-Philippe Vert,et al.  The Kendall and Mallows Kernels for Permutations , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Arthur Gretton,et al.  Antithetic and Monte Carlo kernel estimators for partial rankings , 2018, Statistics and Computing.

[53]  B. Laurent,et al.  Adaptive test of independence based on HSIC measures , 2019, The Annals of Statistics.

[54]  Nonparametric Independence Testing for Right-Censored Data using Optimal Transport. , 2019, 1906.03866.

[55]  Arthur Gretton,et al.  A maximum-mean-discrepancy goodness-of-fit test for censored data , 2018, AISTATS.

[56]  Julien Mairal,et al.  Group Invariance, Stability to Deformations, and Complexity of Deep Convolutional Representations , 2017, J. Mach. Learn. Res..

[57]  D. Sejdinovic,et al.  A kernel- and optimal transport- based test of independence between covariates and right-censored lifetimes , 2019, The International Journal of Biostatistics.

[58]  Tamara Fern'andez,et al.  Kaplan-Meier V- and U-statistics , 2018, Electronic Journal of Statistics.

[59]  Nicolás Rivera,et al.  A reproducing kernel Hilbert space log‐rank test for the two‐sample problem , 2019, Scandinavian Journal of Statistics.

[60]  Feng Liu,et al.  Learning Deep Kernels for Non-Parametric Two-Sample Tests , 2020, ICML.