DuETT: Dual Event Time Transformer for Electronic Health Records

Electronic health records (EHRs) recorded in hospital settings typically contain a wide range of numeric time series data that is characterized by high sparsity and irregular observations. Effective modelling for such data must exploit its time series nature, the semantic relationship between different types of observations, and information in the sparsity structure of the data. Self-supervised Transformers have shown outstanding performance in a variety of structured tasks in NLP and computer vision. But multivariate time series data contains structured relationships over two dimensions: time and recorded event type, and straightforward applications of Transformers to time series data do not leverage this distinct structure. The quadratic scaling of self-attention layers can also significantly limit the input sequence length without appropriate input engineering. We introduce the DuETT architecture, an extension of Transformers designed to attend over both time and event type dimensions, yielding robust representations from EHR data. DuETT uses an aggregated input where sparse time series are transformed into a regular sequence with fixed length; this lowers the computational complexity relative to previous EHR Transformer models and, more importantly, enables the use of larger and deeper neural networks. When trained with self-supervised prediction tasks, that provide rich and informative signals for model pre-training, our model outperforms state-of-the-art deep learning models on multiple downstream tasks from the MIMIC-IV and PhysioNet-2012 EHR datasets.

[1]  Sercan Ö. Arik,et al.  TSMixer: An all-MLP Architecture for Time Series Forecasting , 2023, ArXiv.

[2]  Cheng-Yang Fu,et al.  Hydra Attention: Efficient Attention with Many Heads , 2022, ECCV Workshops.

[3]  P. Rajpurkar,et al.  Self-supervised learning in medicine and healthcare , 2022, Nature Biomedical Engineering.

[4]  P. Missier,et al.  Benchmark time series data sets for PyTorch - the torchtime package , 2022, ArXiv.

[5]  G. Varoquaux,et al.  Why do tree-based models still outperform deep learning on tabular data? , 2022, ArXiv.

[6]  S. McLennan,et al.  You Can’t Have AI Both Ways: Balancing Health Data Privacy and Access Fairly , 2022, Frontiers in Genetics.

[7]  Junchi Yan,et al.  Transformers in Time Series: A Survey , 2022, IJCAI.

[8]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  M. Zitnik,et al.  Graph-Guided Network for Irregularly Sampled Multivariate Time Series , 2021, ICLR.

[10]  Wayne Xin Zhao,et al.  RAPT: Pre-training of Time-Aware Transformer for Learning Robust Healthcare Representation , 2021, KDD.

[11]  Chandan K. Reddy,et al.  Self-Supervised Transformer for Sparse and Irregularly Sampled Multivariate Clinical Time-Series , 2021, ACM Trans. Knowl. Discov. Data.

[12]  Artem Babenko,et al.  Revisiting Deep Learning Models for Tabular Data , 2021, NeurIPS.

[13]  Thomas Lukasiewicz,et al.  Hi-BEHRT: Hierarchical Transformer-Based Model for Accurate Prediction of Clinical Events Using Multimodal Longitudinal Electronic Health Records , 2021, IEEE Journal of Biomedical and Health Informatics.

[14]  A. Dosovitskiy,et al.  MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.

[15]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Peter Szolovits,et al.  A comprehensive EHR timeseries pre-training benchmark , 2021, CHIL.

[17]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[18]  Satya Narayan Shukla,et al.  A Survey on Principles, Models and Methods for Learning from Irregularly Sampled Time Series: From Discretization to Attention and Invariance , 2020, ArXiv.

[19]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[20]  Buyue Qian,et al.  INPREM: An Interpretable and Trustworthy Predictive Model for Healthcare , 2020, KDD.

[21]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[22]  Satya Narayan Shukla,et al.  Multi-Time Attention Networks for Irregularly Sampled Time Series , 2020, ICLR.

[23]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[24]  Ramin M. Hasani,et al.  Learning Long-Term Dependencies in Irregularly-Sampled Time Series , 2020, NeurIPS.

[25]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[26]  Ziqian Xie,et al.  Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction , 2020, npj Digital Medicine.

[27]  Terry Lyons,et al.  Neural Controlled Differential Equations for Irregular Time Series , 2020, NeurIPS.

[28]  Tie-Yan Liu,et al.  On Layer Normalization in the Transformer Architecture , 2020, ICML.

[29]  Toan Q. Nguyen,et al.  Transformers without Tears: Improving the Normalization of Self-Attention , 2019, IWSLT.

[30]  Kazem Rahimi,et al.  BEHRT: Transformer for Electronic Health Records , 2019, Scientific Reports.

[31]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[32]  Victor O. K. Li,et al.  Non-Autoregressive Neural Machine Translation , 2017, ICLR.

[33]  Michael C. Mozer,et al.  Discrete Event, Continuous Time RNNs , 2017, ArXiv.

[34]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[35]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[36]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[37]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[38]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[39]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[40]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[42]  Junchi Yan,et al.  Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting , 2023, ICLR.

[43]  G. Varoquaux,et al.  Why do tree-based models still outperform deep learning on typical tabular data? , 2022, NeurIPS.

[44]  Ravid Shwartz-Ziv Tabular Data: Deep Learning is Not All You Need , 2021 .

[45]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[46]  David Duvenaud,et al.  Latent Ordinary Differential Equations for Irregularly-Sampled Time Series , 2019, NeurIPS.