Q-LEARNING WITH CENSORED DATA.

We develop methodology for a multistage-decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.

[1]  C. Watkins Learning from delayed rewards , 1989 .

[2]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[3]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[4]  T. Karrison,et al.  Use of Irwin's restricted mean as an index for comparing survival in different treatment groups--interpretation and power considerations. , 1997, Controlled clinical trials.

[5]  E Biganzoli,et al.  Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach. , 1998, Statistics in medicine.

[6]  David M. Zucker,et al.  Restricted Mean Life with Covariates: Modification and Extension of a Useful Survival Analysis Method , 1998 .

[7]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[8]  Pascal Massart,et al.  A Dvoretzky-Kiefer-Wolfowitz type inequality for the Kaplan-Meier estimator , 1999 .

[9]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10]  A A Tsiatis,et al.  Causal Inference on the Difference of the Restricted Mean Lifetime Between Two Groups , 2001, Biometrics.

[11]  Somnath Datta,et al.  The Kaplan–Meier Estimator as an Inverse-Probability-of-Censoring Weighted Average , 2001, The American statistician.

[12]  Marie Davidian,et al.  Estimation of Survival Distributions of Treatment Policies in Two‐Stage Randomization Designs in Clinical Trials , 2002, Biometrics.

[13]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[14]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[15]  Ree Dawson,et al.  Dynamic treatment regimes: practical design considerations , 2004, Clinical trials.

[16]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[17]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[18]  James M. Robins,et al.  Association, Causation, And Marginal Structural Models , 1999, Synthese.

[19]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[20]  S. Murphy,et al.  An experimental design for the development of adaptive treatment strategies , 2005, Statistics in medicine.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  Susan A. Murphy,et al.  A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..

[23]  Anastasios A. Tsiatis,et al.  Semiparametric efficient estimation of survival distributions in two-stage randomisation designs in clinical trials with censored data , 2006 .

[24]  Peter F Thall,et al.  Bayesian and frequentist two‐stage treatment strategies based on sequential failure times subject to interval censoring , 2007, Statistics in medicine.

[25]  Wei Chu,et al.  A Support Vector Approach to Censored Targets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[26]  Erica E M Moodie,et al.  Demystifying Optimal Dynamic Treatment Regimes , 2007, Biometrics.

[27]  J. Wellner On an exponential bound for the Kaplan–Meier estimator , 2007, Lifetime data analysis.

[28]  M. J. van der Laan,et al.  Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules , 2007, The international journal of biostatistics.

[29]  S. Murphy,et al.  Methodological Challenges in Constructing Effective Treatment Sequences for Chronic Psychiatric Disorders , 2007, Neuropsychopharmacology.

[30]  J. Robins,et al.  Estimation and extrapolation of optimal treatment and testing strategies , 2008, Statistics in medicine.

[31]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[32]  M. Kosorok Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[33]  M. Socinski,et al.  Considerations for second-line therapy of non-small cell lung cancer. , 2008, The oncologist.

[34]  Jooyong Shim,et al.  Support vector censored quantile regression under random censoring , 2009, Comput. Stat. Data Anal..

[35]  A. Wahed Estimation of survival quantiles in two-stage randomization designs , 2009 .

[36]  M. Kosorok,et al.  Reinforcement learning design for cancer clinical trials , 2009, Statistics in medicine.

[37]  Eric B. Laber,et al.  Statistical Inference in Dynamic Treatment Regimes , 2010, 1006.5831.

[38]  J. Robins,et al.  The International Journal of Biostatistics CAUSAL INFERENCE Dynamic Regime Marginal Structural Mean Models for Estimation of Optimal Dynamic Treatment Regimes , Part I : Main Content , 2011 .

[39]  A. Wahed,et al.  Weighted Kaplan–Meier estimators for two‐stage treatment regimes , 2010, Statistics in medicine.

[40]  R. Ramlau,et al.  Phase III trial comparing vinflunine with docetaxel in second-line advanced non-small-cell lung cancer previously treated with platinum-containing chemotherapy. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[41]  M. Kosorok,et al.  Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer , 2011, Biometrics.

[42]  Michael R. Kosorok,et al.  Support Vector Regression for Right Censored Data , 2012, 1202.5130.