论文信息 - Q-LEARNING WITH CENSORED DATA.

Q-LEARNING WITH CENSORED DATA.

We develop methodology for a multistage-decision problem with flexible number of stages in which the rewards are survival times that are subject to censoring. We present a novel Q-learning algorithm that is adjusted for censored data and allows a flexible number of stages. We provide finite sample bounds on the generalization error of the policy learned by the algorithm, and show that when the optimal Q-function belongs to the approximation space, the expected survival time for policies obtained by the algorithm converges to that of the optimal policy. We simulate a multistage clinical trial with flexible number of stages and apply the proposed censored-Q-learning algorithm to find individualized treatment regimens. The methodology presented in this paper has implications in the design of personalized medicine trials in cancer and in other life-threatening diseases.

M. Kosorok | Y. Goldberg

[1] C. Watkins. Learning from delayed rewards , 1989 .

[2] J. Robins,et al. Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[3] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[4] T. Karrison,et al. Use of Irwin's restricted mean as an index for comparing survival in different treatment groups--interpretation and power considerations. , 1997, Controlled clinical trials.

[5] E Biganzoli,et al. Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach. , 1998, Statistics in medicine.

[6] David M. Zucker,et al. Restricted Mean Life with Covariates: Modification and Extension of a Useful Survival Analysis Method , 1998 .

[7] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[8] Pascal Massart,et al. A Dvoretzky-Kiefer-Wolfowitz type inequality for the Kaplan-Meier estimator , 1999 .

[9] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[10] A A Tsiatis,et al. Causal Inference on the Difference of the Restricted Mean Lifetime Between Two Groups , 2001, Biometrics.

[11] Somnath Datta,et al. The Kaplan–Meier Estimator as an Inverse-Probability-of-Censoring Weighted Average , 2001, The American statistician.

[12] Marie Davidian,et al. Estimation of Survival Distributions of Treatment Policies in Two‐Stage Randomization Designs in Clinical Trials , 2002, Biometrics.

[13] S. Murphy,et al. Optimal dynamic treatment regimes , 2003 .

[14] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[15] Ree Dawson,et al. Dynamic treatment regimes: practical design considerations , 2004, Clinical trials.

[16] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[17] James M. Robins,et al. Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[18] James M. Robins,et al. Association, Causation, And Marginal Structural Models , 1999, Synthese.

[19] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[20] S. Murphy,et al. An experimental design for the development of adaptive treatment strategies , 2005, Statistics in medicine.

[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22] Susan A. Murphy,et al. A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..

[23] Anastasios A. Tsiatis,et al. Semiparametric efficient estimation of survival distributions in two-stage randomisation designs in clinical trials with censored data , 2006 .

[24] Peter F Thall,et al. Bayesian and frequentist two‐stage treatment strategies based on sequential failure times subject to interval censoring , 2007, Statistics in medicine.

[25] Wei Chu,et al. A Support Vector Approach to Censored Targets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[26] Erica E M Moodie,et al. Demystifying Optimal Dynamic Treatment Regimes , 2007, Biometrics.

[27] J. Wellner. On an exponential bound for the Kaplan–Meier estimator , 2007, Lifetime data analysis.

[28] M. J. van der Laan,et al. Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules , 2007, The international journal of biostatistics.

[29] S. Murphy,et al. Methodological Challenges in Constructing Effective Treatment Sequences for Chronic Psychiatric Disorders , 2007, Neuropsychopharmacology.

[30] J. Robins,et al. Estimation and extrapolation of optimal treatment and testing strategies , 2008, Statistics in medicine.

[31] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[32] M. Kosorok. Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[33] M. Socinski,et al. Considerations for second-line therapy of non-small cell lung cancer. , 2008, The oncologist.

[34] Jooyong Shim,et al. Support vector censored quantile regression under random censoring , 2009, Comput. Stat. Data Anal..

[35] A. Wahed. Estimation of survival quantiles in two-stage randomization designs , 2009 .

[36] M. Kosorok,et al. Reinforcement learning design for cancer clinical trials , 2009, Statistics in medicine.

[37] Eric B. Laber,et al. Statistical Inference in Dynamic Treatment Regimes , 2010, 1006.5831.

[38] J. Robins,et al. The International Journal of Biostatistics CAUSAL INFERENCE Dynamic Regime Marginal Structural Mean Models for Estimation of Optimal Dynamic Treatment Regimes , Part I : Main Content , 2011 .

[39] A. Wahed,et al. Weighted Kaplan–Meier estimators for two‐stage treatment regimes , 2010, Statistics in medicine.

[40] R. Ramlau,et al. Phase III trial comparing vinflunine with docetaxel in second-line advanced non-small-cell lung cancer previously treated with platinum-containing chemotherapy. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[41] M. Kosorok,et al. Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer , 2011, Biometrics.

[42] Michael R. Kosorok,et al. Support Vector Regression for Right Censored Data , 2012, 1202.5130.