A Contextual-Bandit-Based Approach for Informed Decision-Making in Clinical Trials

Clinical trials involving multiple treatments utilize randomization of the treatment assignments to enable the evaluation of treatment efficacies in an unbiased manner. Such evaluation is performed in post hoc studies that usually use supervised-learning methods that rely on large amounts of data collected in a randomized fashion. That approach often proves to be suboptimal in that some participants may suffer and even die as a result of having not received the most appropriate treatments during the trial. Reinforcement-learning methods improve the situation by making it possible to learn the treatment efficacies dynamically during the course of the trial, and to adapt treatment assignments accordingly. Recent efforts using \textit{multi-arm bandits}, a type of reinforcement-learning methods, have focused on maximizing clinical outcomes for a population that was assumed to be homogeneous. However, those approaches have failed to account for the variability among participants that is becoming increasingly evident as a result of recent clinical-trial-based studies. We present a contextual-bandit-based online treatment optimization algorithm that, in choosing treatments for new participants in the study, takes into account not only the maximization of the clinical outcomes but also the patient characteristics. We evaluated our algorithm using a real clinical trial dataset from the International Stroke Trial. The results of our retrospective analysis indicate that the proposed approach performs significantly better than either a random assignment of treatments (the current gold standard) or a multi-arm-bandit-based approach, providing substantial gains in the percentage of participants who are assigned the most suitable treatments. The contextual-bandit and multi-arm bandit approaches provide 72.63% and 64.34% gains, respectively, compared to a random assignment.

[1]  D. Berry,et al.  Adaptive assignment versus balanced randomization in clinical trials: a decision analysis. , 1995, Statistics in medicine.

[2]  Christopher J O'Donnell,et al.  Guided antithrombotic therapy: current status and future research direction: report on a National Heart, Lung and Blood Institute working group. , 2012, Circulation.

[3]  Giancarlo Agnelli,et al.  Efficacy and Safety of Anticoagulant Treatment in Acute Cardioembolic Stroke: A Meta-Analysis of Randomized Controlled Trials , 2007, Stroke.

[4]  Ravishankar K. Iyer,et al.  Data-driven longitudinal modeling and prediction of symptom dynamics in major depressive disorder: Integrating factor graphs and learning methods , 2017, 2017 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[5]  M. Zelen,et al.  Play the Winner Rule and the Controlled Clinical Trial , 1969 .

[6]  Balaji Padmanabhan,et al.  SCENE: a scalable two-stage personalized news recommendation system , 2011, SIGIR.

[7]  H. Robbins A Stochastic Approximation Method , 1951 .

[8]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[9]  Adnan I. Qureshi,et al.  Guidelines for the Early Management of Adults With Ischemic Stroke , 2007 .

[10]  J. Cummings,et al.  Alzheimer’s disease drug-development pipeline: few candidates, frequent failures , 2014, Alzheimer's Research & Therapy.

[11]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[12]  Maria Martinez-Lage,et al.  Clinical and pathological heterogeneity of neuronal intermediate filament inclusion disease. , 2008, Archives of neurology.

[13]  Daniel V.T. Catenacci,et al.  Next‐generation clinical trials: Novel strategies to address the challenge of tumor molecular heterogeneity , 2014, Molecular oncology.

[14]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[15]  Benjamin Van Roy,et al.  A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[16]  J. Marshall Why have clinical trials in sepsis failed? , 2014, Trends in molecular medicine.

[17]  Marco Bonetti,et al.  Identifying treatment effect heterogeneity in clinical trials using subpopulations of events: STEPP , 2016, Clinical trials.

[18]  S. Pocock Group sequential methods in the design and analysis of clinical trials , 1977 .

[19]  Pengfei Guo,et al.  A meta-analysis of randomized controlled trials , 2019, Medicine.

[20]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[21]  J. Broderick,et al.  Heart disease and stroke. , 1993, Heart disease and stroke : a journal for primary care physicians.

[22]  C. Coffey,et al.  Adaptive Clinical Trials , 2008, Drugs in R&D.

[23]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[24]  Jack Bowden,et al.  Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. , 2015, Statistical science : a review journal of the Institute of Mathematical Statistics.

[25]  Michael L. Littman,et al.  Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[26]  David Lee Gordon,et al.  Classification of Subtype of Acute Ischemic Stroke: Definitions for Use in a Multicenter Clinical Trial , 1993, Stroke.

[27]  C. Stein A Two-Sample Test for a Linear Hypothesis Whose Power is Independent of the Variance , 1945 .

[28]  Sofía S Villar,et al.  Covariate-Adjusted Response-Adaptive Randomization for Multi-Arm Clinical Trials Using a Modified Forward Looking Gittins Index Rule , 2017, Biometrics.

[29]  Draft Guidance Guidance for Industry Adaptive Design Clinical Trials for Drugs and Biologics DRAFT GUIDANCE , 2010 .

[30]  The International Stroke Trial (IST): a randomised trial of aspirin, subcutaneous heparin, both, or neither among 19435 patients with acute ischaemic stroke. International Stroke Trial Collaborative Group. , 1997, Lancet.

[31]  Matthias Schilling,et al.  Analysis of early phase and subsequent phase III stroke studies of neuroprotectants: outcomes and predictors for success , 2014, Experimental & Translational Stroke Medicine.

[32]  D. Berry Adaptive clinical trials: the promise and the caution. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[33]  R. Simon,et al.  Adaptive Signature Design: An Adaptive Clinical Trial Design for Generating and Prospectively Testing A Gene Expression Signature for Sensitive Patients , 2005, Clinical Cancer Research.