Learning Individualized Treatment Rules with Estimated Translated Inverse Propensity Score

Randomized controlled trials typically analyze the effectiveness of treatments with the goal of making treatment recommendations for patient subgroups. With the advance of electronic health records, a great variety of data has been collected in clinical practice, enabling the evaluation of treatments and treatment policies based on observational data. In this paper, we focus on learning individualized treatment rules (ITRs) to derive a treatment policy that is expected to generate a better outcome for an individual patient. In our framework, we cast ITRs learning as a contextual bandit problem and minimize the expected risk of the treatment policy. We conduct experiments with the proposed framework both in a simulation study and based on a real-world dataset. In the latter case, we apply our proposed method to learn the optimal ITRs for the administration of intravenous (IV) fluids and vasopressors (VP). Based on various offline evaluation methods, we could show that the policy derived in our framework demonstrates better performance compared to both the physicians and other baselines, including a simple treatment prediction approach. As a long-term goal, our derived policy might eventually lead to better clinical guidelines for the administration of IV and VP.

[1]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[2]  Yoshua Bengio,et al.  The problem of learning long-term dependencies in recurrent networks , 1993, IEEE International Conference on Neural Networks.

[3]  T. Hesterberg,et al.  Weighted Average Importance Sampling and Defensive Mixture Distributions , 1995 .

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[6]  J. Vincent,et al.  Serial evaluation of the SOFA score to predict outcome in critically ill patients. , 2001, JAMA.

[7]  John Langford,et al.  The offset tree for learning with partial labels , 2008, KDD.

[8]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[9]  S. Murphy,et al.  PERFORMANCE GUARANTEES FOR INDIVIDUALIZED TREATMENT RULES. , 2011, Annals of statistics.

[10]  John Langford,et al.  Doubly Robust Policy Evaluation and Learning , 2011, ICML.

[11]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[12]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[14]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Thorsten Joachims,et al.  The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[17]  Chris Eliasmith,et al.  Hyperopt: a Python library for model selection and hyperparameter optimization , 2015 .

[18]  Volker Tresp,et al.  Predicting Sequences of Clinical Events by Using a Personalized Temporal Latent Embedding Model , 2015, 2015 International Conference on Healthcare Informatics.

[19]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[20]  R. Bellomo,et al.  The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). , 2016, JAMA.

[21]  Volker Tresp,et al.  Predicting Clinical Events by Combining Static and Dynamic Information Using Recurrent Neural Networks , 2016, 2016 IEEE International Conference on Healthcare Informatics (ICHI).

[22]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[23]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[24]  Walter F. Stewart,et al.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[25]  J. Marc Overhage,et al.  Going Digital: A Survey on Digitalization and Large-Scale Data Analytics in Healthcare , 2016, Proceedings of the IEEE.

[26]  Jimeng Sun,et al.  RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.

[27]  Michael R Kosorok,et al.  Residual Weighted Learning for Estimating Individualized Treatment Rules , 2015, Journal of the American Statistical Association.

[28]  Peter Szolovits,et al.  Deep Reinforcement Learning for Sepsis Treatment , 2017, ArXiv.

[29]  Volker Tresp,et al.  Predictive Modeling of Therapy Decisions in Metastatic Breast Cancer with Recurrent Neural Network Encoder and Multinomial Hierarchical Regression Decoder , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[30]  F. V. van Haren,et al.  Fluid resuscitation in human sepsis: Time to rewrite history? , 2017, Annals of Intensive Care.

[31]  Aldo A. Faisal,et al.  The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care , 2018, Nature Medicine.

[32]  M. de Rijke,et al.  Deep Learning with Logged Bandit Feedback , 2018, ICLR.

[33]  Volker Tresp,et al.  Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[34]  Volker Tresp,et al.  Efficient Dialog Policy Learning via Positive Memory Retention , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[35]  Srivatsan Srinivasan,et al.  Evaluating Reinforcement Learning Algorithms in Observational Health Settings , 2018, ArXiv.

[36]  Yuan Zhou,et al.  Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy , 2019, ICLR.