A comprehensive EHR timeseries pre-training benchmark

Pre-training (PT) has been used successfully in many areas of machine learning. One area where PT would be extremely impactful is over electronic health record (EHR) data. Successful PT strategies on this modality could improve model performance in data-scarce contexts such as modeling for rare diseases or allowing smaller hospitals to benefit from data from larger health systems. While many PT strategies have been explored in other domains, much less exploration has occurred for EHR data. One reason for this may be the lack of standardized benchmarks suitable for developing and testing PT algorithms. In this work, we establish a PT benchmark dataset for EHR timeseries data, establishing cohorts, a diverse set of fine-tuning tasks, and PT-focused evaluation regimes across two public EHR datasets: MIMIC-III and eICU. This benchmark fills an essential hole in the field by enabling a robust manner of iterating on PT strategies for this modality. To show the value of this benchmark and provide baselines for further research, we also profile two simple PT algorithms: a self-supervised, masked imputation system and a weakly-supervised, multi-task system. We find that PT strategies (in particular weakly-supervised PT methods) can offer significant gains over traditional learning in few-shot settings, especially on tasks with strong class imbalance. Our full benchmark and code are publicly available at https://github.com/mmcdermott/comprehensive_MTL_EHR

[1]  Dimitris Bertsimas,et al.  Predicting inpatient flow at a major hospital using interpretable analytics , 2020, medRxiv.

[2]  Wei-Hung Weng,et al.  Publicly Available Clinical BERT Embeddings , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[3]  Anna Goldenberg,et al.  Dynamic Measurement Scheduling for Event Forecasting using Deep RL , 2019, ICML.

[4]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[5]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[6]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[7]  Anna Goldenberg,et al.  Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks , 2019, MLHC.

[8]  Peter Szolovits,et al.  Understanding vasopressor intervention and weaning: risk prediction in a public heterogeneous clinical time series database , 2017, J. Am. Medical Informatics Assoc..

[9]  Marzyeh Ghassemi,et al.  MIMIC-Extract: a data extraction, preprocessing, and representation pipeline for MIMIC-III , 2019, CHIL.

[10]  Andrew Slavin Ross,et al.  Improving Sepsis Treatment Strategies by Combining Deep and Kernel-Based Reinforcement Learning , 2018, AMIA.

[11]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[12]  John Canny,et al.  Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.

[13]  Jingqi Wang,et al.  Enhancing Clinical Concept Extraction with Contextual Embedding , 2019, J. Am. Medical Informatics Assoc..

[14]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[15]  Edmund E Wilkes,et al.  Using machine learning to predict laboratory test results , 2016, Annals of clinical biochemistry.

[16]  Fei Wang,et al.  A Multi-task Learning Framework for Joint Disease Risk Prediction and Comorbidity Discovery , 2014, 2014 22nd International Conference on Pattern Recognition.

[17]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[18]  Peter Szolovits,et al.  Clinical Intervention Prediction and Understanding using Deep Networks , 2017, ArXiv.

[19]  Gunnar Rätsch,et al.  SOM-VAE: Interpretable Discrete Representation Learning on Time Series , 2018, ICLR 2018.

[20]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[21]  Kirk Roberts,et al.  Deep Patient Representation of Clinical Notes via Multi-Task Learning for Mortality Prediction. , 2019, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[22]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[23]  Fan Yang,et al.  XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[24]  Nigam H Shah,et al.  Language models are an effective representation learning technique for electronic health record data , 2020, J. Biomed. Informatics.

[25]  S. Haneuse,et al.  Small Data Challenges of Studying Rare Diseases. , 2020, JAMA network open.

[26]  Regina Barzilay,et al.  Investigating Resuscitation Code Assignment in the Intensive Care Unit using Structured and Unstructured Data. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[27]  Jimeng Sun,et al.  MiME: Multilevel Medical Embedding of Electronic Health Records for Predictive Healthcare , 2018, NeurIPS.

[28]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[29]  Andrew L. Beam,et al.  Practical guidance on artificial intelligence for health-care data. , 2019, The Lancet. Digital health.

[30]  Mihaela van der Schaar,et al.  VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain , 2020, NeurIPS.

[31]  Andrew M. Dai,et al.  Learning to Select Best Forecast Tasks for Clinical Outcome Prediction , 2020, NeurIPS.

[32]  George Hripcsak,et al.  Parameterizing time in electronic health record studies , 2015, J. Am. Medical Informatics Assoc..

[33]  Nigam H. Shah,et al.  The Effectiveness of Multitask Learning for Phenotyping with Electronic Health Records Data , 2019, PSB.

[34]  Jekaterina Novikova,et al.  Cross-Language Aphasia Detection using Optimal Transport Domain Adaptation , 2019, ML4H@NeurIPS.

[35]  V. N. Slee,et al.  The International Classification of Diseases: ninth revision (ICD-9) , 1978, Annals of internal medicine.

[36]  Jimeng Sun,et al.  Pre-training of Graph Augmented Transformers for Medication Recommendation , 2019, IJCAI.

[37]  Alistair E. W. Johnson,et al.  The eICU Collaborative Research Database, a freely available multi-center database for critical care research , 2018, Scientific Data.

[38]  Yoshua Bengio,et al.  Professor Forcing: A New Algorithm for Training Recurrent Networks , 2016, NIPS.