The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation

The goal of understanding decision-making behaviours in clinical environments is of paramount importance if we are to bring the strengths of machine learning to ultimately improve patient outcomes. Mainstream development of algorithms is often geared towards optimal performance in tasks that do not necessarily translate well into the medical regime—due to several factors including the lack of public availability of realistic data, the intrinsically offline nature of the problem, as well as the complexity and variety of human behaviours. We therefore present a new benchmarking suite designed specifically for medical sequential decision modelling: the Medkit-Learn(ing) Environment, a publicly available Python package providing simple and easy access to high-fidelity synthetic medical data. While providing a standardised way to compare algorithms in a realistic medical setting, we employ a generating process that disentangles the policy and environment dynamics to allow for a range of customisations, thus enabling systematic evaluation of algorithms’ robustness against specific challenges prevalent in healthcare.

[1]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[2]  Matthieu Geist,et al.  Batch, Off-Policy and Model-Free Apprenticeship Learning , 2011, EWRL.

[3]  Ahmed M. Alaa,et al.  Personalized Risk Scoring for Critical Care Prognosis Using Mixtures of Gaussian Processes , 2016, IEEE Transactions on Biomedical Engineering.

[4]  Ilya Kostrikov,et al.  Imitation Learning via Off-Policy Distribution Matching , 2019, ICLR.

[5]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  M. Johnston,et al.  Implementation Science BioMed Central Systematic Review Organizational interventions to implement improvements in , 2006 .

[8]  I. Klerings,et al.  Information overload in healthcare: too much of a good thing? , 2015, Zeitschrift fur Evidenz, Fortbildung und Qualitat im Gesundheitswesen.

[9]  Nick Sevdalis,et al.  Teamwork in the operating theatre: cohesion or confusion? , 2006, Journal of evaluation in clinical practice.

[10]  Yao Liu,et al.  Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions , 2020, ICML.

[11]  C. Merlo,et al.  Cystic fibrosis lung transplantation , 2011, Current opinion in pulmonary medicine.

[12]  Mihaela van der Schaar,et al.  Estimating Counterfactual Treatment Outcomes over Time Through Adversarially Balanced Representations , 2020, ICLR.

[13]  Finale Doshi-Velez,et al.  Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs , 2020, NeurIPS.

[14]  Sid E O'Bryant,et al.  Staging dementia using Clinical Dementia Rating Scale Sum of Boxes scores: a Texas Alzheimer's research consortium study. , 2008, Archives of neurology.

[15]  Mihaela van der Schaar,et al.  Scalable Bayesian Inverse Reinforcement Learning , 2021, ICLR.

[16]  Sergio Gomez Colmenarejo,et al.  RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning , 2020, NeurIPS.

[17]  Mihaela van der Schaar,et al.  Attentive State-Space Modeling of Disease Progression , 2019, NeurIPS.

[18]  Prabhat Nagarajan,et al.  Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.

[19]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[20]  Stuart J. Russell,et al.  The MAGICAL Benchmark for Robust Imitation , 2020, NeurIPS.

[21]  Mihaela van der Schaar,et al.  Generative Time-series Modeling with Fourier Flows , 2021, ICLR.

[22]  Mihaela van der Schaar,et al.  Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning , 2021, International Conference on Learning Representations.

[23]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[24]  Gunnar Rätsch,et al.  Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs , 2017, ArXiv.

[25]  Jonathan Lawry,et al.  Modelling Agent Policies with Interpretable Imitation Learning , 2020, ArXiv.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[28]  Sankey V. Williams,et al.  Management of stable ischemic heart disease: summary of a clinical practice guideline from the American College of Physicians/American College of Cardiology Foundation/American Heart Association/American Association for Thoracic Surgery/Preventive Cardiovascular Nurses Association/Society of Thoraci , 2012, Annals of internal medicine.

[29]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[30]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[31]  Mihaela van der Schaar,et al.  Strictly Batch Imitation Learning by Energy-based Distribution Matching , 2020, NeurIPS.

[32]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[33]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[34]  Rachel K. E. Bellamy,et al.  Visualizations for an Explainable Planning Agent , 2017, IJCAI.

[35]  P. Shekelle,et al.  Diagnosis and Treatment of Low Back Pain: A Joint Clinical Practice Guideline from the American College of Physicians and the American Pain Society , 2007, Annals of Internal Medicine.

[36]  Bikramjit Banerjee,et al.  Model-Free IRL Using Maximum Likelihood Estimation , 2019, AAAI.

[37]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[38]  Mihaela van der Schaar,et al.  Inverse Active Sensing: Modeling and Understanding Timely Decision-Making , 2020, ICML.

[39]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[40]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[41]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[42]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[43]  Finale Doshi-Velez,et al.  POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning , 2020, AISTATS.

[44]  Dietrich Paulus,et al.  Simitate: A Hybrid Imitation Learning Benchmark , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[45]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[46]  Mihaela van der Schaar,et al.  Time-series Generative Adversarial Networks , 2019, NeurIPS.

[47]  Denise R. Aberle,et al.  Using Sequential Decision Making to Improve Lung Cancer Screening Performance , 2019, IEEE Access.

[48]  Andrew J. Davison,et al.  RLBench: The Robot Learning Benchmark & Learning Environment , 2019, IEEE Robotics and Automation Letters.

[49]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[50]  Joshua B. Tenenbaum,et al.  Online Bayesian Goal Inference for Boundedly-Rational Planning Agents , 2020, NeurIPS.

[51]  David Sontag,et al.  Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models , 2019, ICML.

[52]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[53]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[54]  Matthieu Geist,et al.  Boosted and reward-regularized classification for apprenticeship learning , 2014, AAMAS.

[55]  Fredrik D. Johansson,et al.  Guidelines for reinforcement learning in healthcare , 2019, Nature Medicine.

[56]  Isabelle Guyon,et al.  Medical Time-Series Data Generation Using Generative Adversarial Networks , 2020, AIME.