Kernelized Stein Discrepancy Tests of Goodness-of-fit for Time-to-Event Data

Survival Analysis and Reliability Theory are concerned with the analysis of time-to-event data, in which observations correspond to waiting times until an event of interest such as death from a particular disease or failure of a component in a mechanical system. This type of data is unique due to the presence of censoring, a type of missing data that occurs when we do not observe the actual time of the event of interest but, instead, we have access to an approximation for it given by random interval in which the observation is known to belong. Most traditional methods are not designed to deal with censoring, and thus we need to adapt them to censored time-to-event data. In this paper, we focus on non-parametric goodness-of-fit testing procedures based on combining the Stein's method and kernelized discrepancies. While for uncensored data, there is a natural way of implementing a kernelized Stein discrepancy test, for censored data there are several options, each of them with different advantages and disadvantages. In this paper, we propose a collection of kernelized Stein discrepancy tests for time-to-event data, and we study each of them theoretically and empirically; our experimental results show that our proposed methods perform better than existing tests, including previous tests based on a kernelized maximum mean discrepancy.

[1]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[2]  J. Crowley,et al.  Covariance Analysis of Heart Transplant Survival Data , 1977 .

[3]  M. Hollander,et al.  Testing to Determine the Underlying Distribution Using Randomly Censored Data. , 1979 .

[4]  T. Fleming,et al.  Different chemotherapeutic sensitivities and host factors affecting prognosis in advanced ovarian carcinoma versus minimal residual disease. , 1979, Cancer treatment reports.

[5]  R. Gill Censoring and stochastic integrals , 1980 .

[6]  R. Gill Large Sample Behaviour of the Product-Limit Estimator on the Whole Line , 1983 .

[7]  M. Akritas Pearson-Type Goodness-of-Fit Tests: The Univariate Case , 1988 .

[8]  P. Schmidt,et al.  Survival analysis: A survey , 1991 .

[9]  Song Yang A central limit theorem for functionals of the Kaplan--Meier estimator , 1994 .

[10]  P. Novotny,et al.  Prospective evaluation of prognostic variables from patient-completed questionnaires. North Central Cancer Treatment Group. , 1994, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[11]  D. Collet Modelling Survival Data in Medical Research , 2004 .

[12]  H. Dehling,et al.  Random quadratic forms and the bootstrap for U -statistics , 1994 .

[13]  M. Pagano,et al.  Survival analysis. , 1996, Nutrition.

[14]  Wayne Nelson Theory and Applications of Hazard Plotting for Censored Failure Data , 2000, Technometrics.

[15]  S. Love,et al.  Survival Analysis Part II: Multivariate data analysis – an introduction to concepts and methods , 2003, British Journal of Cancer.

[16]  Martin Raič,et al.  Normal Approximation by Stein ’ s Method , 2003 .

[17]  Jun Yan Survival Analysis: Techniques for Censored and Truncated Data , 2004 .

[18]  Louis H. Y. Chen,et al.  An Introduction to Stein's Method , 2005 .

[19]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[20]  A. Biswas,et al.  Statistical Advances in the Biomedical Sciences: Clinical Trials, Epidemiology, Survival Analysis, and Bioinformatics , 2007 .

[21]  O. Aalen,et al.  Survival and Event History Analysis: A Process Point of View , 2008 .

[22]  L. Mirabello,et al.  Osteosarcoma incidence and survival rates from 1973 to 2004 , 2009, Cancer.

[23]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[24]  H. Putter,et al.  Competing risks in epidemiology: possibilities and pitfalls. , 2012, International journal of epidemiology.

[25]  G. Reinert,et al.  Stein's method for comparison of univariate distributions , 2014, 1408.2998.

[26]  A. Janssen,et al.  Weighted Logrank Permutation Tests for Randomly Right Censored Life Science Data , 2014 .

[27]  Lester W. Mackey,et al.  Measuring Sample Quality with Stein's Method , 2015, NIPS.

[28]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[29]  Katharina Burger,et al.  Counting Processes And Survival Analysis , 2016 .

[30]  K. Chansky,et al.  Survival analyses in lung cancer. , 2016, Journal of thoracic disease.

[31]  Arthur Gretton,et al.  A Kernel Test of Goodness of Fit , 2016, ICML.

[32]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[33]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[34]  Qiang Liu,et al.  Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy , 2018, ICML.

[35]  T. Therneau,et al.  Nonalcoholic fatty liver disease incidence and impact on metabolic burden and death: A 20 year‐community study , 2018, Hepatology.

[36]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[37]  A. Oza,et al.  Computational Modeling of Ovarian Cancer Reveals Optimal Strategies for Therapy and Screening , 2019 .

[38]  Arthur Gretton,et al.  A maximum-mean-discrepancy goodness-of-fit test for censored data , 2018, AISTATS.

[39]  Kenji Fukumizu,et al.  A Kernel Stein Test for Comparing Latent Variable Models , 2019, Journal of the Royal Statistical Society Series B: Statistical Methodology.

[40]  Vinayak A. Rao,et al.  A Stein-Papangelou Goodness-of-Fit Test for Point Processes , 2019, AISTATS.

[41]  Takeru Matsuda,et al.  A Stein Goodness-of-fit Test for Directional Distributions , 2020, AISTATS.

[42]  Nicolás Rivera,et al.  A reproducing kernel Hilbert space log‐rank test for the two‐sample problem , 2019, Scandinavian Journal of Statistics.

[43]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[44]  Arthur Gretton,et al.  A Kernel Log-Rank Test of Independence for Right-Censored Data , 2019, Journal of the American Statistical Association.