Few-Shot Electronic Health Record Coding through Graph Contrastive Learning

Electronic health record (EHR) coding is the task of assigning ICD codes to each EHR. Most previous studies either only focus on the frequent ICD codes or treat rare and frequent ICD codes in the same way. These methods perform well on frequent ICD codes but due to the extremely unbalanced distribution of ICD codes, the performance on rare ones is far from satisfactory. We seek to improve the performance for both frequent and rare ICD codes by using a contrastive graph-based EHR coding framework, CoGraph, which re-casts EHR coding as a few-shot learning task. First, we construct a heterogeneous EHR word-entity (HEWE) graph for each EHR, where the words and entities extracted from an EHR serve as nodes and the relations between them serve as edges. Then, CoGraph learns similarities and dissimilarities between HEWE graphs from different ICD codes so that information can be transferred among them. In a few-shot learning scenario, the model only has access to frequent ICD codes during training, which might force it to encode features that are useful for frequent ICD codes only. To mitigate this risk, CoGraph devise two graph contrastive learning schemes, GSCL and GECL, that exploit the HEWE graph structures so as to encode transferable features. GSCL utilizes the intra-correlation of different sub-graphs sampled from HEWE graphs while GECL exploits the inter-correlation among HEWE graphs at different clinical stages. Experiments on the MIMIC-III benchmark dataset show that CoGraph significantly outperforms state-of-the-art methods on EHR coding, not only on frequent ICD codes, but also on rare codes, in terms of several evaluation indicators. On frequent ICD codes, GSCL and GECL improve the classification accuracy and F1 by 1.31% and 0.61%, respectively, and on rare ICD codes CoGraph has more obvious improvements by 2.12% and 2.95%.

[1]  Shih-Fu Chang,et al.  Unsupervised Embedding Learning via Invariant and Spreading Instance Feature , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Pengtao Xie,et al.  Contrastive Self-supervised Learning for Graph Classification , 2020, AAAI.

[3]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Yan Xu,et al.  Few-Shot Learning with Intra-Class Knowledge Transfer , 2020, ArXiv.

[6]  Yizhu Jiao,et al.  Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning , 2020, 2020 IEEE International Conference on Data Mining (ICDM).

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[9]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[10]  Yuzhe Yang,et al.  Rethinking the Value of Labels for Improving Class-Imbalanced Learning , 2020, NeurIPS.

[11]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Hui Zhao,et al.  Attention-based bidirectional gated recurrent unit neural networks for sentiment analysis , 2019, AIPR '19.

[13]  Walter F. Stewart,et al.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[14]  M. de Rijke,et al.  Coding Electronic Health Records with Adversarial Reinforcement Path Generation , 2020, SIGIR.

[15]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Andrew Y. Ng,et al.  Improving palliative care with deep learning , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[17]  Matthias Grossglauser,et al.  Self-Supervised Prototypical Transfer Learning for Few-Shot Classification , 2020, ICML 2020.

[18]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[19]  Jinmiao Huang,et al.  An Empirical Evaluation of Deep Learning for ICD-9 Code Assignment using MIMIC-III Clinical Notes , 2018, Comput. Methods Programs Biomed..

[20]  Nigam H. Shah,et al.  Improving palliative care with deep learning , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[21]  Lei Li,et al.  Enhancing Automatic ICD-9-CM Code Assignment for Medical Texts with PubMed , 2017, BioNLP.

[22]  Yanning Zhang,et al.  Deep Self-Supervised Learning for Few-Shot Hyperspectral Image Classification , 2020, IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium.

[23]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[24]  Philip S. Yu,et al.  EHR Coding with Multi-scale Feature Attention and Structured Knowledge Graph Propagation , 2019, CIKM.

[25]  Lina Yao,et al.  Diagnosis Code Assignment Using Sparsity-Based Disease Correlation Embedding , 2016, IEEE Transactions on Knowledge and Data Engineering.

[26]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[27]  Yuxiao Dong,et al.  GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training , 2020, KDD.

[28]  Mathieu Aubry,et al.  Impact of base dataset design on few-shot image classification , 2020, ECCV.

[29]  Trevor Darrell,et al.  Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.

[30]  Oladimeji Farri,et al.  Condensed Memory Networks for Clinical Diagnostic Inferencing , 2016, AAAI.

[31]  Tianyu Gao,et al.  SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[32]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[33]  Noémie Elhadad,et al.  Multi-Label Classification of Patient Notes: Case Study on ICD Code Assignment , 2018, AAAI Workshops.

[34]  Yizhou Sun,et al.  On Sampling Strategies for Neural Network-based Collaborative Filtering , 2017, KDD.

[35]  Minnan Luo,et al.  Self-Supervised Graph Representation Learning via Global Context Prediction , 2020, ArXiv.

[36]  Pascale Fung,et al.  Personalizing Dialogue Agents via Meta-Learning , 2019, ACL.

[37]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[38]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[39]  Pietro Liò,et al.  Deep Graph Infomax , 2018, ICLR.

[40]  Yinghuan Shi,et al.  Unsupervised Few-shot Learning via Distribution Shift-based Augmentation , 2020, ArXiv.

[41]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[42]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[43]  Pengtao Xie,et al.  A Neural Architecture for Automated ICD Coding , 2017, ACL.

[44]  Jiawei Zhang,et al.  Graph-Bert: Only Attention is Needed for Learning Graph Representations , 2020, ArXiv.

[45]  Jian Sun,et al.  Dynamic Memory Induction Networks for Few-Shot Text Classification , 2020, ACL.

[46]  Jimeng Sun,et al.  Explainable Prediction of Medical Codes from Clinical Text , 2018, NAACL.

[47]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[48]  M. de Rijke,et al.  Order-free Medicine Combination Prediction with Graph Convolutional Reinforcement Learning , 2019, CIKM.

[49]  Fei Teng,et al.  Explainable Prediction of Medical Codes With Knowledge Graphs , 2020, Frontiers in Bioengineering and Biotechnology.

[50]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[51]  Huan Liu,et al.  Graph Prototypical Networks for Few-shot Learning on Attributed Networks , 2020, CIKM.

[52]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[53]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[54]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[55]  Subhransu Maji,et al.  When Does Self-supervision Improve Few-shot Learning? , 2020, ECCV.

[56]  Pengtao Xie,et al.  Multimodal Machine Learning for Automated ICD Coding , 2018, MLHC.

[57]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.