Contextual Bandits for Adapting Treatment in a Mouse Model of de Novo Carcinogenesis

In this work, we present a specific case study where we aim to design effective treatment allocation strategies and validate these using a mouse model of skin cancer. Collecting data for modelling treatments effectiveness on animal models is an expensive and time consuming process. Moreover, acquiring this information during the full range of disease stages is hard to achieve with a conventional random treatment allocation procedure, as poor treatments cause deterioration of subject health. We therefore aim to design an adaptive allocation strategy to improve the efficiency of data collection by allocating more samples for exploring promising treatments. We cast this application as a contextual bandit problem and introduce a simple and practical algorithm for exploration-exploitation in this framework. The work builds on a recent class of approaches for non-contextual bandits that relies on subsampling to compare treatment options using an equivalent amount of information. On the technical side, we extend the subsampling strategy to the case of bandits with context, by applying subsampling within Gaussian Process regression. On the experimental side, preliminary results using 10 mice with skin tumours suggest that the proposed approach extends by more than 50% the subjects life duration compared with baseline strategies: no treatment, random treatment allocation, and constant chemotherapeutic agent. By slowing the tumour growth rate, the adaptive procedure gathers information about treatment effectiveness on a broader range of tumour volumes, which is crucial for eventually deriving sequential pharmacological treatment strategies for cancer.

[1]  Aravindh Krishnamoorthy,et al.  Matrix inversion using Cholesky decomposition , 2011, 2013 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA).

[2]  Joelle Pineau,et al.  Adaptive control of epileptiform excitability in an in vitro model of limbic seizures , 2013, Experimental Neurology.

[3]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[4]  Nello Cristianini,et al.  Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[5]  Philippe Rigollet,et al.  Nonparametric Bandits with Covariates , 2010, COLT.

[6]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[7]  Chun-Liang Li,et al.  High Dimensional Bayesian Optimization via Restricted Projection Pursuit Models , 2016, AISTATS.

[8]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[9]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[10]  Jing Ning,et al.  A response-adaptive design of initial therapy for emergency department patients with heart failure. , 2017, Contemporary clinical trials.

[11]  S. Berry,et al.  Efficiencies of platform clinical trials: A vision of the future , 2016, Clinical trials.

[12]  Sarah A Heerboth,et al.  Drug Resistance in Cancer: An Overview , 2014, Cancers.

[13]  Georgios D. Mitsis,et al.  Model-Based Tumor Growth Dynamics and Therapy Response in a Mouse Model of De Novo Carcinogenesis , 2015, PloS one.

[14]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[15]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[16]  Allan Balmain,et al.  Activation of the mouse cellular Harvey-ras gene in chemically induced benign skin papillomas , 1984, Nature.

[17]  Jack Bowden,et al.  Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. , 2015, Statistical science : a review journal of the Institute of Mathematical Statistics.

[18]  José David Martín-Guerrero,et al.  Optimization of anemia treatment in hemodialysis patients via reinforcement learning , 2014, Artif. Intell. Medicine.

[19]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[20]  P. Johnston,et al.  5-Fluorouracil: mechanisms of action and clinical strategies , 2003, Nature Reviews Cancer.

[21]  P. Lambert,et al.  Two distinct activities contribute to human papillomavirus 16 E6's oncogenic potential. , 2005, Cancer research.

[22]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[23]  M. Kosorok,et al.  Reinforcement learning design for cancer clinical trials , 2009, Statistics in medicine.

[24]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[25]  Nando de Freitas,et al.  Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..

[26]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[27]  R Bellman,et al.  On the Theory of Dynamic Programming. , 1952, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[29]  Louis Wehenkel,et al.  Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[30]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[31]  A. Aldo Faisal,et al.  The use of reinforcement learning algorithms to meet the challenges of an artificial pancreas , 2013, Expert review of medical devices.

[32]  Edward S. Kim,et al.  Bayesian adaptive design for targeted therapy development in lung cancer — a step toward personalized medicine , 2008, Clinical trials.

[33]  M. Tomayko,et al.  Determination of subcutaneous tumor size in athymic (nude) mice , 2004, Cancer Chemotherapy and Pharmacology.

[34]  Thomas Jaki,et al.  A Bayesian adaptive design for clinical trials in rare diseases , 2016, Comput. Stat. Data Anal..

[35]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[36]  Vianney Perchet,et al.  The multi-armed bandit problem with covariates , 2011, ArXiv.

[37]  Andreas Krause,et al.  High-Dimensional Gaussian Process Bandits , 2013, NIPS.