Precision medicine as a control problem: Using simulation and deep reinforcement learning to discover adaptive, personalized multi-cytokine therapy for sepsis

Sepsis is a life-threatening condition affecting one million people per year in the US in which dysregulation of the body's own immune system causes damage to its tissues, resulting in a 28 - 50% mortality rate. Clinical trials for sepsis treatment over the last 20 years have failed to produce a single currently FDA approved drug treatment. In this study, we attempt to discover an effective cytokine mediation treatment strategy for sepsis using a previously developed agent-based model that simulates the innate immune response to infection: the Innate Immune Response agent-based model (IIRABM). Previous attempts at reducing mortality with multi-cytokine mediation using the IIRABM have failed to reduce mortality across all patient parameterizations and motivated us to investigate whether adaptive, personalized multi-cytokine mediation can control the trajectory of sepsis and lower patient mortality. We used the IIRABM to compute a treatment policy in which systemic patient measurements are used in a feedback loop to inform future treatment. Using deep reinforcement learning, we identified a policy that achieves 0% mortality on the patient parameterization on which it was trained. More importantly, this policy also achieves 0.8% mortality over 500 randomly selected patient parameterizations with baseline mortalities ranging from 1 - 99% (with an average of 49%) spanning the entire clinically plausible parameter space of the IIRABM. These results suggest that adaptive, personalized multi-cytokine mediation therapy could be a promising approach for treating sepsis. We hope that this work motivates researchers to consider such an approach as part of future clinical trials. To the best of our knowledge, this work is the first to consider adaptive, personalized multi-cytokine mediation therapy for sepsis, and is the first to exploit deep reinforcement learning on a biological simulation.

[1]  Alan S. Perelson,et al.  Agent-based modeling of host–pathogen systems: The successes and challenges , 2008, Information Sciences.

[2]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[3]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[4]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[5]  Abbas Ahmadi,et al.  Simulation-based optimization of radiotherapy: Agent-based modeling and reinforcement learning , 2017, Math. Comput. Simul..

[6]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[7]  Glen E. P. Ropella,et al.  Toward modular biological models: defining analog modules based on referent physiological mechanisms , 2014, BMC Systems Biology.

[8]  Gary An,et al.  Sepsis reconsidered: Identifying novel metrics for behavioral landscape characterization with a high-performance computing implementation of an agent-based model. , 2017, Journal of theoretical biology.

[9]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[10]  David Abrahams,et al.  Building hybrid systems with Boost.Python , 2003 .

[11]  G. An In silico experiments of existing and hypothetical cytokine-directed clinical trials using agent-based modeling* , 2004, Critical care medicine.

[12]  F Castiglione,et al.  An enhanced agent based model of the immune system response. , 2006, Cellular immunology.

[13]  Li Li,et al.  Traffic signal timing via deep reinforcement learning , 2016, IEEE/CAA Journal of Automatica Sinica.

[14]  Peter Stone,et al.  The Impact of Determinism on Learning Atari 2600 Games , 2015, AAAI Workshop: Learning for General Competency in Video Games.

[15]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[16]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[17]  Anand Kumar,et al.  Cytokine modulation in sepsis and septic shock , 2002, Expert opinion on investigational drugs.

[18]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[19]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[20]  R Laubenbacher,et al.  Optimization and Control of Agent-Based Models in Biology: A Perspective , 2016, Bulletin of mathematical biology.

[21]  David T. Huang,et al.  A systematic review and meta-analysis of early goal-directed therapy for septic shock: the ARISE, ProCESS and ProMISe Investigators , 2015, Intensive Care Medicine.

[22]  John C. Marshall,et al.  Such stuff as dreams are made on: mediator-directed therapy in sepsis , 2003, Nature Reviews Drug Discovery.

[23]  J. Marshall,et al.  Clinical trials of mediator-directed therapy in sepsis: what have we learned? , 2000, Intensive Care Medicine.

[24]  Daniel Kudenko,et al.  Theoretical and Empirical Analysis of Reward Shaping in Reinforcement Learning , 2009, 2009 International Conference on Machine Learning and Applications.

[25]  S. Opal,et al.  The Next Generation of Sepsis Clinical Trial Designs: What Is Next After the Demise of Recombinant Human Activated Protein C?* , 2014, Critical care medicine.

[26]  Glen E. P. Ropella,et al.  Agent-based modeling: a systematic assessment of use cases and requirements for enhancing pharmaceutical research and development productivity , 2013, Wiley interdisciplinary reviews. Systems biology and medicine.

[27]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[28]  Joseph D Butner,et al.  Simulating cancer growth with multiscale agent-based modeling. , 2015, Seminars in cancer biology.

[29]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[30]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[31]  Shayn M. Peirce,et al.  Multi-cell Agent-based Simulation of the Microvasculature to Study the Dynamics of Circulating Inflammatory Cell Trafficking , 2007, Annals of Biomedical Engineering.

[32]  Adil Rafiq Rather,et al.  The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3) , 2015 .

[33]  Derek C. Angus,et al.  Pharmacoeconomic Implications of New Therapies in Sepsis , 2012, PharmacoEconomics.

[34]  K. Reinhart,et al.  Anti-tumor necrosis factor therapy in sepsis: Update on clinical trials and lessons learned , 2001, Critical care medicine.

[35]  Peter Szolovits,et al.  Continuous State-Space Models for Optimal Sepsis Treatment: a Deep Reinforcement Learning Approach , 2017, MLHC.

[36]  Noe Casas,et al.  Deep Deterministic Policy Gradient for Urban Traffic Light Control , 2017, ArXiv.

[37]  Charles Natanson,et al.  Risk and the efficacy of antiinflammatory agents: retrospective and confirmatory studies of sepsis. , 2002, American journal of respiratory and critical care medicine.

[38]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[39]  G. An,et al.  Agent‐based models in translational systems biology , 2009, Wiley interdisciplinary reviews. Systems biology and medicine.

[40]  Gary An,et al.  Determining controllability of sepsis using genetic algorithms on a proxy agent-based model of systemic inflammation , 2017, bioRxiv.

[41]  T. Deisboeck,et al.  Development of a three-dimensional multiscale agent-based tumor model: simulating gene-protein interaction profiles, cell phenotypes and multicellular patterns in brain cancer. , 2006, Journal of theoretical biology.